Databricks runtime family welcomes newest member, goes 5.4

Databricks Data Lake

Databricks has released updates to its runtimes, raising them all to v5.4, and extended its portfolio to make Python users life easier.

The regular Databricks Runtime comes with two public previews of new features – auto optimise for Delta Lake on Databricks and a Glue metastore. While the first is meant to help with compacting files for optimal performance when it comes to writing data into cloud storage, the second lets developers use the AWS Glue Data Catalog as a metastore, replacing Hive if needed.

Other additions include an optimised FUSE mount and ways to efficiently read arbitrary files into Spark DataFrames. Databricks Connect and Library Utilities have reached general availability status with the current release.

Machine learning enthusiasts however might be more interested to learn that there’s also a new version of the Databricks Runtime ML, the company’s tailored offering for this special subsection of AI. The most exciting experimental feature in v5.4 is an implementation of hyperparameter tuning library Hyperopt which Databricks advertises as easy to run in a distributed manner.

Advertisement

To help users to stay on top of tuning experiments, they will now be logged – along with the tuned parameters and targeted metrics – to MLflow. This is thanks to an integration of MLflow-Mlib, which is also still in the preview phase.

Apart from that, Databricks also introduced a new runtime based on open source package and environment management system Conda. The not yet production ready addition to the Databricks family was inspired by the use of Conda in the Databricks Runtime for Machine Learning. Having said that, the fondness many of the company’s Python users seem to have for the tool might have helped the decision, too.

Databricks Runtime Conda 5.4 (Beta) is meant to help with managing Python environments by offering configured environments for Azure and AWS, easy customisation via a requirements file, as well as isolation and reproducibility.

If all goes well, Conda will become the default package manager for users working primarily in Python. In upcoming releases, the team mainly plans to improve the user experience, set up more pre-configured environments, and extend Conda support to Library Utilities in notebooks. More details about the new runtime beta can be found in the official announcement.

- Advertisement -