Apache Airflow Official Apache Airflow Website is a tool for defining, executing and monitoring workflow as code. It created by Airbnb company and made open source in 2015 at Github. It further joined Apache Software Foundation in 2016.
Your workflow is a collection of tasks. A workflow definition doesn’t bother about what a task does. They make sure whatever they do, they do at a specific time, order and good handling of unforeseen exceptions.
Few hypothetical examples of Workflow:
- Data Analysis Workflow:
- Task 1 – Fetch a CSV file from a Cloud Object Storage location like AWS S3.
- Task 2 – Analyze the CSV data using Pandas or other libraries.
- Task 3 – Send an Analysis report email.
- Weather Alert Workflow:
- Task 1 – Continuously monitoring Weather for a specific location using third party service.
- Task 2 – Send notification or email if the weather will change to rain, storm or snow in some time.
In Airflow, the definition for the workflow of tasks is called a Directed Acyclic Graph (DAG). Airflow DAG or workflow defined in a Python script (file).
Each Task is a unit of work of DAG. The task is an implementation of an
Operator. DAG tasks associated, depend or not depend on each other. It represents a node in the DAG Graph. Example of a few Operator Class:
PythonOperator– To run any arbitrary Python code.
GcpTranslateSpeechOperator– Google Cloud Speech Translate Operators, it recognizes speech in audio input and translates it.
There are huge numbers of Operators for various tools and services like Hadoop, AWS, Azure, Kubernetes, etc. Which makes Apache Airflow a lucrative option for ETL / ELT (Extraction, Transform and Load Workflow), Batch Job and Integrations workflow.
Apache Airflow provide a rich web user interface to monitor execution of workflow.
List of Apache Airflow Guides
Note: We will continuously add links to new Apache Airflow Guides, Keep watching this space.
References [ + ]