Just before going on their end of year break, the team behind Airbnb-bred open source project Apache Airflow has pushed its second major out of the door. Amongst other things the platform for programmatically authoring, scheduling, and monitoring workflows now comes with a stabilised REST API and introduces a new paradigm to write directed acyclic graphs (DAG).
The latter was proposed in February and adds an API to explicitly declare messages passed between tasks in a DAG. The so-called TaskFlow API is meant to help handle dependencies and promises to make declaring PythonOperator a tad easier. In order to improve the experience when grouping tasks via the UI, the Airflow team has come up with the concept of task groups, which can be used instead of SubDAGs but comes without drawbacks such as limited parallelism.
In the spirit of speeding things up and making them easier, the KubernetesExecutor has been re-architected so users can easily access the Kubernetes API to create their.yaml pod_template_file without the need to specify parameters in the Airflow configuration. Switching from an executor_config dictionary to the pod_override parameter, “which takes a Kubernetes V1Pod object for a1:1 setting override”, helped to reduce the codebase of the executor and improved execution speed.
Another focal point on the way to Airflow 2.0 was an initiative to improve the performance of the scheduler, which seems to have reduced the time the component needs to start tasks. Practitioners using the scheduler with Postgres 9.6+ or MySQL 8+ can also profit from the option to run more than one scheduler instance at a time, which adds resiliency to any setup and makes the scheduler high availability compatible.
Users who have been familiar with Airflow for a while will probably notice the cleaned up user interface, which features a slightly refreshed colour scheme, a new icon system, an updated global navigation menu, DAG views, and table presentation.
However, under the hood a lot has changed as well, since the project has been split into a core and 61 provider packages for things like external services, databases, and protocols. This should help make installations more flexible and offer the structure for better extendability to those looking to write their own providers and connection types. Additional details can be found in the project’s repository.