Databricks Runtime 5.5 previews Instance Pools

AI/ML

By Team Devclass

July 17, 2019

Databricks Runtime 5.5 previews Instance Pools

Databricks, the company behind open source project Apache Spark, has given its Runtime a good old polishing, buffing the version number up to 5.5.

The new Databricks Runtime is, amongst other things, able to use AWS Glue instead of Hive, and R notebooks have been added to the Python and Scala spanning list of notebooks the product’s Secrets API can inject secrets into.

Version 5.5 also comes with a couple of preview features. One of them is Instance Pools, which lets users hold back some virtual machines which can be used to quickly spin up clusters if needed. While the VMs are idle, only cloud provider costs are incurred with no costs at all if the pool is scaled down to zero instances, according to Databricks.

Those using the Databricks Runtime on AWS can give querying Delta Lake tables from Presto or Amazon Athena a go, and improve the final version by leaving feedback. The function is realised via manifest files the services can examine instead of going through the directory listing to find files.

A feature only available by contacting support, is a new version of the Databricks Filesystem FUSE (Filesystem in userspace) client. The reworked offering is meant to improve performance on all DBFS locations, mounts included, after previous runtime versions already introduced high-performance FUSE storage to dbfs:/ml.

Along with the normal release, there is also a new version of the Runtime for Machine Learning available. Databricks Runtime for ML 5.5 comes with a MLflow 1.0 package added, and upgrades for TensorFlow, PyTorch, and scikit-learn. The ML-specific runtime also saw an HorovodRunner update, giving users a way of distributing their training within a single node, which is meant to make the use of multiple GPUs more efficient.

More adventurous Databricks customers are able to try a preview of a function allowing the recursive loading of files from nested input directories, as well as the Pandas UDF type scalar iterator. The latter can lead to a speedup for some models, since it helps to apply a model to multiple input batches without having to initialise it again and again.

Looking forward, Databricks is planning to drop Python 2 support with the release of Runtime 6.0, which should happen later in 2019. However, there are plans to offer long-term support for the last 5.x release, to make sure there is still a maintained version to run Python 2 code on a little longer if necessary. The step isn’t that surprising, given that that version of the programming language is coming to its end of life next year.

Databricks Runtime 5.5 previews Instance Pools

Docker adds AI agents to Compose along with GPU-powered cloud Offload service

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

ABOUT US

FOLLOW US