Production Docker Image for Apache Airflow
Intro
What this talk is NOT about?
- Basic container image knowlegde
- Details of CI container image of Airflow
- Details of how Kubernetes Airflow integrate
- “Airflow on Kubernetes” by Michael Hewitt
- Details on deploying Airflow with the image
Who is the talk for?
- You want to deploy Airflow using container images
- You want to contribute to Airflow in Devops area
- You want to learn about best practices of using Airflow Containers
- You are a curious person that want to learn something new
What is a container?
- Standard unit of software
- Packages code and its dependencies
- Lightweight execution package of software
- Container images - binary packages
Container != Docker
- Docker is a command line tool
- Building, Running, Sharing containers
- Docker Engine runs containers
- DockerHub.com is popular container registry
Context: What is Container file
FROM ubuntu:18.04 COPY . /app RUN make /app && make install WORKDIR /bin/project ENTRYPOINT ["/bin/project"] CMD ["--help"]
- Specify base image
- Run commands
- Copy files
- Set working directory
- Define entrypoint
- Define default command
Why containers are important?
- Predictable, consistent development & test environment
- Predictable, consistent execution environment
- Lightweight but isolated: sandboxed view of the OS isolated from others
- Build once: run anywhere
- Kubernetes runs containers natively
- Bridge: “Development → Operations”
Internals
Features of the production image file
- Builds optimised image
- Highly customizable (ARGs)
- Multi segmented (build + main)
## Usage
Extending Airflow image - use released image
docker build . -t yourcompany/airflow:1.10.11-BUILD_Id
FROM apache/airflow:1.10.11 # change to root user temporarily USER root RUN apt-get update \ && apt-get install -y --no-install-recommends \ emacs \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf '/var/lib/apt/lists/*' # Change back to the airflow user USER airflow # Add extra dependencies RUN pip install --user numpy # Embed DAGs (Optionally) - DAGs can be baked in but also # they can be git-synced or mounted from shared volume COPY --chown=airflow:root dags-folder $(AIRFLOW_HOME)/dags/
Extending image - Pros & Cons
Pros
- Use releases images
- Simple build command
- Own Dockerfile
- No need for Airflow sources
Cons
- Potentially bigger size
- Predefined extras only
- Installs limited set of python dependecies
Customizing Airflow image - default docker build
Customizing Airflow image - use build args
- Installs from PyPi == 1.10.11
- Additional airflow extras, dev, runtime deps …
- Does not use local sources (can be run from master including entrypoint)
It's a Breeze to build images
- Breeze - development and test environment
- Supports building production image
- Auto-complete of options
- New Breeze video showing building production images:
./breeze build-image --help
See BREEZE.rst in the Airflow repo
How to deploy the images?
- Docker and Docker-Compose - not recommended for production
- Managed Container Services
- Managed: Amazon ECS, Google Container on VMs, Azure Container Instances
- Kubernetes on-Prem:
- Airflow Operator (not recommended yet)
- OpenShift (also Kubernetes)
관련 문서
Plugin Backlinks: 아무 것도 없습니다.