Echo Kanak

Airflow DAGs 101

Defining airflow dags and task dependencies

Mar 1, 2025
Airflow DAGs 101


Apache Airflow is one of the most popular tools for orchestrating data workflows. At the heart of Airflow lies the DAG (Directed Acyclic Graph)

DAG

In Airflow, a DAG (Directed Acyclic Graph) is a Python-based definition of a workflow. It represents a collection of tasks with clearly defined relationships.

 
  • Directed: Each task has a defined order of execution (upstream/downstream).
  • Acyclic: Tasks cannot loop back to themselves — preventing infinite cycles.
  • Graph: Tasks are connected visually, making the pipeline easy to monitor.
 

A DAG can be as simple as one task or as complex as thousands of interconnected tasks.

A simple DAG with three tasks might look like this

 

Defining DAGs: Multiple Approaches

We can define our DAG with different ways

1. The DAG Decorator (Recommended)

The most modern and clean approach uses the @dag decorator:

2. Context Manager Approach

Using the with statement provides clear scoping for your DAG definition:

3. Traditional Operators

For more complex tasks, we can use specific operators:

Managing Task Dependencies

Dependencies define the order in which tasks execute.

Simple Linear Dependencies

For straightforward workflows, use the bitshift operator (>>):

Parallel Task Execution

To have multiple tasks on the same level, use lists:

Complex Dependencies

For workflows like above if we use

 

we’ll get something like

each time we explicitly call a task it creates an instance of the task.

avoid creating duplicate task instances by using variables:

Using Chain for Complex Dependencies

the chain function provides a cleaner syntax for the same:

Key Takeaways and Best Practices

  1. Unique Identifiers: Every DAG must have a unique identifier across your Airflow instance
  1. Start Date: While optional (defaults to None), setting a start date is crucial for scheduling
  1. Schedule Intervals: Define how frequently your DAG should run (@daily, @hourly, cron expressions, etc.)
  1. Documentation: Always include descriptions and tags to make your DAGs discoverable and maintainable
  1. Operator Selection: Before writing custom code, check the Astronomer Registry for existing operators
  1. Task Naming: Each task must have a unique identifier within its DAG
  1. Default Arguments: Use default_args dictionary to set common parameters across all tasks
  1. Dependency Patterns: Use bitshift operators (>>, <<) and lists for simple dependencies, and chain for complex patterns
  1. Avoid Task Duplication: When a task has multiple downstream dependencies, store it in a variable to prevent creating duplicate instances

You might also like

BlogPro logo
Made with BlogPro