MLflow should get Kubernetes and Windows support before it hits v1.0 sometime in the first half of next year, the founders of the open Machine Learning platform said this week.
The MLflow project was launched by Databricks in June, and hit v0.7 this week, with the project gunning for a full 1.0 designation in the first half of 2019.
Project co-founder Matei Zaharia, who is also Databricks CTO and one of the originators of Spark, told Devclass this week that current work was balanced between ensuring stability of what’s in there already, and adding new features.
“One of the main goals is to make sure the current components are working really well,” he said, but, “We designed the current components so we can actually add others later.”
“For example, one thing we’d like to add eventually is that wherever you deploy your model we’d want it to be able to report back with information about how it’s doing. And then you can use that for monitoring and analysis,” he said. “That can be added without changing the core today, it’s just additional calls.”
The aim with MLflow is to be able to do this across different cloud platforms, he continued. “Maybe even on edge devices. For each deployment…[we] want to make sure the stuff that makes it back can all be queried.”
Going further on the issue of deployment Databricks co-founder and VP Andy Konwinski said, “We feel a lot of customers are using Kubernetes as a resource or solution – so you’ll be able to deploy to Kubernetes in the cloud or Kubernetes on premise.”
Data scientists are a resourceful lot, and Zaharia said “You can kind of do that yourself at the moment. You can already deploy a software container and you can use Kubernetes for that. We want to make that easier.”
“Seamless,” added Konwinski.
Databricks has also said Windows support is on the agenda – which might not seem like a top to do item to some. “The Windows support will be needed for people who do development on Windows – who want to work on their laptop,” said Zaharia.
It’s just another aspect of open-ness, the pair reckon, in a world where anyone experimenting with Machine Learning can wake up one morning to find they are inextricably tied to a single platform provider.
Locking in Machine Learning?
Konwinski said “The community appreciates an open source alternative in this space, because there’s growing concern about vendor lock-in. Having an open source alternative addresses that. It’s a nice layer on top of the multi-cloud offering.”
Talking of offerings, Databricks’s own hosted version of MLflow has been available for a month and a half as an alpha preview.
Konwinski said it had “ten active accounts” and a funnel of closer to a hundred. “We’re getting daily feedback from customers.”
What sort of feedback has MLflow generated so far?
He said, “People love the ability to track multiple languages and have a central place their data scientists can collaborate on the results of runs, said Konwinski. “Having composability of their projects and reproduceability has been something they’ve found extremely valuable, in particular multi-stage workflows and the ability to share code and reuse each others’ code.
“Finally,” he said, “We’ve a lot of people who’ve used all of the different deploy modes, both the cloud vendors you can deploy to plus on premise type deployments of the models that they’re building.”
All of which could be, let’s say, comforting for organisations who struggle to get their data scientists and main developer organisations working in sync.
“This is like the software engineering workflow for Machine Learning”, said Zaharia. “It resonates very well with everyone who’s done a little bit of Machine Learning.”