To coincide with PyTorch Developer Day, data analytics biz Databricks has released version 1.12 of machine learning platform MLflow. Besides making the first steps towards explainability, the new release heavily features improved support for Facebook’s deep learning package.
The most noteworthy addition, however, has to be the new universal tracking API hiding behind the new mlflow.autolog() method. According to Databricks, it automatically logs all relevant model entities from parameters to artifacts for all supported integrations, without the need to call separate logging methods. The results are then presented via the MLflow UI for further inspection.
Thanks to a “joint engineering investment” with Facebook, this should additionally work for metrics from PyTorch Lightning models, and the team has also added tracking for optimizer names, learning rates and the like.
The API is meant to help with converting PyTorch models to model intermediate representation TorchScript, which can then be used to access a model’s properties or deploy it to a TorchServer Server. To realise the latter, the joint initiative devised a TorchServe MLflow deployment plugin that interested developers can access through the MLflow repository, along with some example code to get started.
The MLflow team also attempted to make the Databricks platform more interesting to R programmers by ensuring it also works with the scalable machine learning platform H20.
MLflow users looking to build explainability into their process should look into the mlflow.shap module, which fits the platform with an implementation of the SHAP algorithm. SHAP is short for SHapley Additive exPlanations and is commonly used for figuring out how the features used for model training have influenced the final output.
On the more infrastructural side of things, there’s now a “MLFLOW_S3_IGNORE_TLS environment variable to enable skipping TLS verification of S3 endpoint” if needed. Information on further bug fixes and under the hood cleanups can be found in the 1.12 release notes.
Practitioners interested in learning more about the MLflow update and the project’s next steps can attend the Data + AI Summit Europe that Databricks is running this week. Other topics discussed at the event might include the company’s new SQL Analytics service, a preview for which was introduced last week. The portfolio enhancement is meant to “provide Databricks customers with a first-class experience for performing BI and SQL workloads directly on the data lake”.