# Production Docker Image for Apache Airflow
- [Production Docker Image for Apache Airflow](https://www.youtube.com/watch?v=wDr3Y7q2XoI)
{{youtube>wDr3Y7q2XoI}}
## Intro
### What this talk is NOT about?
- Basic container image knowlegde
- https://docker-curriculum.com
- Details of CI container image of [[Airflow]]
- https://github.com/apache/airflow/blob/master/IMAGES.rst
- Details of how Kubernetes Airflow integrate
- "Airflow on Kubernetes" by Michael Hewitt
- https://www.crowdcast.io/e/airflowsummit/6
- Details on deploying Airflow with the image
### Who is the talk for?
- You want to deploy Airflow using container images
- You want to contribute to Airflow in Devops area
- You want to learn about best practices of using Airflow Containers
- You are a curious person that want to learn something new
### What is a container?
- Standard unit of software
- OCI: https://opencontainers.org
- Packages code and its dependencies
- Lightweight execution package of software
- Container images - binary packages
### Container != Docker
- [[Docker]] is a command line tool
- Building, Running, Sharing containers
- [[Docker Engine]] runs containers
- Alternatives: [[rkt]], [[containerd]], [[runc]], [[podman]], [[lxc]], ...
- [[DockerHub.com]] is popular container registry
- Alternatives: [[GitHub]], [[GCR]], [[ECR]], [[ACR]]
### Context: What is Container file
FROM ubuntu:18.04
COPY . /app
RUN make /app && make install
WORKDIR /bin/project
ENTRYPOINT ["/bin/project"]
CMD ["--help"]
- Specify base image
- Run commands
- Copy files
- Set working directory
- Define entrypoint
- Define default command
### Why containers are important?
- Predictable, consistent development & test environment
- Predictable, consistent execution environment
- Lightweight but isolated: sandboxed view of the OS isolated from others
- Build once: run anywhere
- Kubernetes runs containers natively
- Bridge: "Development -> Operations"
## Internals
### Features of the production image file
- Builds optimised image
- Highly customizable (ARGs)
- Multi segmented (build + main)
## Usage
### Extending Airflow image - use released image
docker build . -t yourcompany/airflow:1.10.11-BUILD_Id
FROM apache/airflow:1.10.11
# change to root user temporarily
USER root
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
emacs \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf '/var/lib/apt/lists/*'
# Change back to the airflow user
USER airflow
# Add extra dependencies
RUN pip install --user numpy
# Embed DAGs (Optionally) - DAGs can be baked in but also
# they can be git-synced or mounted from shared volume
COPY --chown=airflow:root dags-folder $(AIRFLOW_HOME)/dags/
### Extending image - Pros & Cons
#### Pros
- Use releases images
- Simple build command
- Own Dockerfile
- No need for Airflow sources
#### Cons
- Potentially bigger size
- Predefined extras only
- Installs limited set of python dependecies
### Customizing Airflow image - default docker build
git clone git@github.com:apache/airflow.git
cd airflow
git checkout v1-10-stable
docker build .
### Customizing Airflow image - use build args
- Installs from PyPi == 1.10.11
- Additional airflow extras, dev, runtime deps ...
- Does not use local sources (can be run from master including entrypoint)
{{ https://i.imgur.com/7NxqHmy.jpg }}
### It's a Breeze to build images
- Breeze - development and test environment
- Supports building production image
- Auto-complete of options
- New Breeze video showing building production images:
- https://s.apache.org/airflow-breeze
- `./breeze build-image --help`
See [[BREEZE.rst]] in the Airflow repo
{{ https://i.imgur.com/f7dnoey.jpg }}
### How to deploy the images?
- Docker and Docker-Compose - not recommended for production
- Managed Container Services
- Managed: Amazon [[ECS]], Google Container on VMs, Azure Container Instances
- Kubernetes on-Prem:
- [[Helm Chart]]
- Airflow Operator (not recommended yet)
- Managed Kubernetes: Amazon [[EKS]], Google [[GKe]], Azure [[AKS]]
- [[OpenShift]] (also Kubernetes)