Delta Lake finds new home at Linux Foundation

Delta Lake finds new home at Linux Foundation

Databricks used the currently happening Spark + AI Summit Europe to announce a change in the governance of Delta Lake.

The storage layer was introduced to the public in April 2019 and is now in the process of moving to the Linux Foundation, which also fosters software projects such as the Linux kernel and Kubernetes.

The new home is meant to drive the adoption of Delta Lake and establish it as a standard for managing big data. Databricks’ cofounder Ali Ghodsi commented the move in a canned statement. “To address organizations’ data challenges we want to ensure this project is open source in the truest form. Through the strength of the Linux Foundation community and contributions, we’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”

In the next couple of months, the foundation will likely help Delta Lake to set up an open governance model, before going on to get more parties interested in the project. Recently, companies like Alibaba and Intel have joined the community around Delta Lake, helping to get support for other open source projects such as Presto, Apache Hive, and Nifi off the ground.

Initially, Delta Lake was built to work as a layer on Apache Spark, which doesn’t come as a huge surprise considering Spark creator Matei Zaharia is also one of the founders of Databricks. Ghodsi liked to describe the project during the introductory phase as “the next step in the evolution of the data journey” promising users better quality, reliability, scalability, and performance in their data lake-related endeavours.

“Just storing lots and lots and lots of data and dumping it into a data lake, doesn’t mean that you can later actually do something useful with it.”

Delta Lake helps in regards to quality and performance by for example checking if the data funneled through it confirms to a predefined schema and allowing the application of ACID transactions to the stream. The latter are tracked in so called delta files, which can be used to look into long gone states of the data and use those as a basis for new transactions.