# Production Docker Image for Apache Airflow - [Production Docker Image for Apache Airflow](https://www.youtube.com/watch?v=wDr3Y7q2XoI) {{youtube>wDr3Y7q2XoI}} ## Intro ### What this talk is NOT about? - Basic container image knowlegde - https://docker-curriculum.com - Details of CI container image of [[Airflow]] - https://github.com/apache/airflow/blob/master/IMAGES.rst - Details of how Kubernetes Airflow integrate - "Airflow on Kubernetes" by Michael Hewitt - https://www.crowdcast.io/e/airflowsummit/6 - Details on deploying Airflow with the image ### Who is the talk for? - You want to deploy Airflow using container images - You want to contribute to Airflow in Devops area - You want to learn about best practices of using Airflow Containers - You are a curious person that want to learn something new ### What is a container? - Standard unit of software - OCI: https://opencontainers.org - Packages code and its dependencies - Lightweight execution package of software - Container images - binary packages ### Container != Docker - [[Docker]] is a command line tool - Building, Running, Sharing containers - [[Docker Engine]] runs containers - Alternatives: [[rkt]], [[containerd]], [[runc]], [[podman]], [[lxc]], ... - [[DockerHub.com]] is popular container registry - Alternatives: [[GitHub]], [[GCR]], [[ECR]], [[ACR]] ### Context: What is Container file FROM ubuntu:18.04 COPY . /app RUN make /app && make install WORKDIR /bin/project ENTRYPOINT ["/bin/project"] CMD ["--help"] - Specify base image - Run commands - Copy files - Set working directory - Define entrypoint - Define default command ### Why containers are important? - Predictable, consistent development & test environment - Predictable, consistent execution environment - Lightweight but isolated: sandboxed view of the OS isolated from others - Build once: run anywhere - Kubernetes runs containers natively - Bridge: "Development -> Operations" ## Internals ### Features of the production image file - Builds optimised image - Highly customizable (ARGs) - Multi segmented (build + main) ## Usage ### Extending Airflow image - use released image docker build . -t yourcompany/airflow:1.10.11-BUILD_Id FROM apache/airflow:1.10.11 # change to root user temporarily USER root RUN apt-get update \ && apt-get install -y --no-install-recommends \ emacs \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf '/var/lib/apt/lists/*' # Change back to the airflow user USER airflow # Add extra dependencies RUN pip install --user numpy # Embed DAGs (Optionally) - DAGs can be baked in but also # they can be git-synced or mounted from shared volume COPY --chown=airflow:root dags-folder $(AIRFLOW_HOME)/dags/ ### Extending image - Pros & Cons #### Pros - Use releases images - Simple build command - Own Dockerfile - No need for Airflow sources #### Cons - Potentially bigger size - Predefined extras only - Installs limited set of python dependecies ### Customizing Airflow image - default docker build git clone git@github.com:apache/airflow.git cd airflow git checkout v1-10-stable docker build . ### Customizing Airflow image - use build args - Installs from PyPi == 1.10.11 - Additional airflow extras, dev, runtime deps ... - Does not use local sources (can be run from master including entrypoint) {{ https://i.imgur.com/7NxqHmy.jpg }} ### It's a Breeze to build images - Breeze - development and test environment - Supports building production image - Auto-complete of options - New Breeze video showing building production images: - https://s.apache.org/airflow-breeze - `./breeze build-image --help` See [[BREEZE.rst]] in the Airflow repo {{ https://i.imgur.com/f7dnoey.jpg }} ### How to deploy the images? - Docker and Docker-Compose - not recommended for production - Managed Container Services - Managed: Amazon [[ECS]], Google Container on VMs, Azure Container Instances - Kubernetes on-Prem: - [[Helm Chart]] - Airflow Operator (not recommended yet) - Managed Kubernetes: Amazon [[EKS]], Google [[GKe]], Azure [[AKS]] - [[OpenShift]] (also Kubernetes)