Intro To Apache Airflow
Intro To Apache Airflow
Intro To Apache Airflow
Introduction
Agenda
● What is Airflow
● What is a workflow
● Example of an Airflow workflow
● Background and the world before Airflow
● Purpose
● Terminologies
● Core Components
● Usages
● Demo
What is Airflow
● Apache Airflow is an open-source platform for programmatically
authoring, scheduling, and monitoring workflows.
● A sequence of tasks
● Started on a schedule or triggered by an event
● Frequently used to handle big data processing pipelines
1. Download data
2. Send data to processing
3. monitor processing
4. Generate report
5. Send email
Background
Task
Execution Webserver Web UI
logs
Metadata
Scheduler Workers
database
Airflow usage
● Run and automate ETL pipelines
● Data ingestion pipelines
● Machine learning pipelines
● Predictive data pipelines
● General purpose scheduling
Airflow architecture
● Scheduler: Triggers scheduled workflows and submits tasks to executor to
run
● Executor: Manages tasks
● Worker: Runs the tasks
● Webserver:Supports the user interface
● Metadata database: Stores information about DAGs and tasks