Almost three years after joining the Apache Software Foundation’s incubation program, workflow management platform Airflow has reached top-level project status.
Airflow began as an open source Python project at home sharing service Airbnb in 2014. Its purpose is to offer a way to “programmatically author, schedule and monitor workflows” that also includes a means to integrate other platforms. The latter is realised through Airflow’s extensible architecture, which facilitates cooperation with projects such as Apache Hadoop HDFS, Hive, AWS S3, container software Docker and container orchestrator Kubernetes.
As the list suggests, Airflow’s focus lies on pipelines for Big Data computations. It therefore isn’t surprising to find many data-heavy companies such as Twitter, Google, and PayPal amongst the workflow orchestrator’s contributor and user base.
In the announcement, PayPal’s Chief Data Engineer Sid Anand states that “with over 250 PB of data under management, PayPal relies on workflow schedulers such as Apache Airflow to manage its data movement needs reliably.” According to him, the project also takes care of the company’s system orchestration needs such as self-healing, provisioning, and autoscaling.
Of course the project isn’t without any competitors: Spotify’s Python module Luigi as well as AWS’ Glue do similar things. Airflow however is supposed to be better able to handle distributed execution when compared to Luigi and is – as well as an open source project – not restricted to a single platform, which is why some might prefer it to Glue.
Airflow’s step up the Apache ladder is a sign that the project follows the processes and principles laid out by the software foundation. The status might also help with the orchestrator’s visibility and attract more users as well as additional contributors.
The Apache Software Foundation fosters open source projects such as the Apache HTTP Server, Hadoop, Spark, and Cassandra. It is a non-profit funded by donations and corporate sponsors such as Microsoft, Google, Red Hat, Facebook, IBM, Huawei, ARM, and Alibaba Cloud Computing.