Echo Kanak

Airflow XComs 101

Mar 31, 2025
Airflow XComs 101


When building pipelines in Apache Airflow, tasks often need to share data with each other.
That’s where XComs (cross-communication)

What are XComs?

XComs enable tasks in your Airflow DAGs to exchange small amounts of data. :

  1. Push: A task stores data in XCom using a unique identifier
  1. Pull: Another task retrieves that data using the same identifier
  1. Storage: XCom data is stored in Airflow's metadata database by default
  1. Identification: Each XCom is uniquely identified by multiple fields: key, run ID, task ID, and DAG ID
 

Every task instance in Airflow gets its own context dictionary that contains metadata about the current execution.

  • context["ti"] β†’ refers to the TaskInstance object of the currently running task

XCom Implementation Patterns

 

Method 1: Explicit Context Usage

The most verbose way using the full context dictionary:

Method 2: Direct TaskInstance Access

Accessing the TaskInstance directly:

Method 3: Implicit XCom with return values (Recommended)

Pythonic approach uses return values and function parameters:

  • Less boilerplate code
  • More readable and intuitive
  • Follows Python conventions
  • Automatic XCom handling behind the scenes

Advanced XCom Patterns

Pulling from Multiple Tasks

When we need data from several upstream tasks, to ensure proper dependencies and we use task ID lists:

Pushing Multiple Values

Use dictionaries to organize and share multiple related values:

XCom Limitations and Best Practices

  1. Keep XComs small - they're for metadata, not bulk data
    1. use for : file paths, URLs, run IDs, execution metadata, row counts, processing stats, small config dicts, status flags, control signals, DB connection strings.

      avoid for : raw CSVs, large JSON dumps, entire DataFrames, full datasets, binary files, images, large API responses.

  1. Size Constraints: XCom storage limits vary by database:
      • SQLite: Up to 2GB
      • Postgres: Up to 1GB
      • MySQL: Up to 64MB
  1. Use external storage for large datasets and pass references via XCom
  1. It is unsuitable for sharing large amounts of data, so one should trigger a Spark job or similar.
  1. JSON Serialization: Data must be JSON serializable (strings, numbers, lists, dictionaries, booleans, null)

a dag example:

 
 

You might also like

BlogPro logo
Made with BlogPro