RStudio has thrown its weight behind MLflow, the multi-cloud Machine Learning framework launched by the creators of Apache Spark back in June.
Databricks CTO and MLflow tech lead – and Spark originator – Matei Zaharia said that Databricks and RStudio have partnered to produce an R API in the latest version of MLflow, which was showcased at the Spark+AI Summit Europe today.
The move means the framework now supports R, Python, Scala and Java, as well as the use of other languages via a REST server interface, and was enough for Databricks to declare MLflow “the most comprehensive open source machine learning platform, with support for multiple programming languages, integrations with popular machine learning libraries, and support for multiple clouds.”
MLflow aims to manage the machine learning lifecycle from start to finish. It has three main components: tracking experiments, including recording and comparing parameters and results; packaging ML code so it can be reused, shared and put into production; and managing and deploying models from multiple ML libraries.
Or more succinctly, it brings some more traditional software disciplines to the sometimes chaotic world of data science, and should reduce the number of reinventions of the wheel required to implement projects.
MLflow is still in alpha, but Zaharia told devclass it was targeted to hit a full fat v1.0 in the first half of next year. “Then we’ll guarantee there’s no breaking changes until 2.x”.
In the meantime, he said, there were still significant features it wanted to integrate ahead of the move to 1.0, but its key aim was to make the platform as stable as possible.
Having RStudio involved substantially increases the number of contributors to the project said Zaharia. The internal team is six he said, but the total number of contributors now stood at 48. He said that it took “a couple of years” for Spark to get 30 contributors.
JJ Allaire, chief executive officer at RStudio, said in a canned statement, “In many organizations machine learning workflows are far too ad-hoc, with no systematic tracking of experiments, inadequate protocols around reproducibility, and no consistent way to package and deploy models. MLflow helps address these issues in a uniform fashion across languages and frameworks.
“Integration of R with MLflow will significantly broaden the reach of the project by allowing a broader community to use and contribute to MLflow.”