Model does what? Cloudera adds monitoring and explainability to machine learning service

Model does what? Cloudera adds monitoring and explainability to machine learning service

Cloudera Machine Learning (CML) has gotten an update, finally extending the big data company’s shared data experience (SDX) to machine learning models and offering performance monitoring to help those struggling to get their setup production ready.

First among the highlighted features are a few additions to let teams keep an eye on how their models are doing. A new monitoring service has been integrated, which is said to address both functional as well as technical aspects. 

It is supposed to help measure things like model drift, latency and prediction accuracy, which can then be visualised through a dedicated UI. However, since setups and monitoring goals differ from company to company, Cloudera also provides a Python SDK for writing custom code for tracking and analyzing.

CML is advertised as a machine learning service for the Cloudera Data Platform, a product of the Hortonworks-Cloudera merger completed in early 2019, so it was about time for the tool to be able to make use of the platform’s shared data experience. SDX was created to enforce uniform policies in different environments. Its now completed extension to CML means security and governance rules set on CDP can be used for model deployment as well, which bodes well for production environments. 

If that doesn’t sound interesting enough, the company also said to have extended the data governance and metadata framework Apache Atlas, which is part of the platform, to include model metadata. This could prove useful to debug and explain models, since those information shed light on how a model was built, including which data was used. Features like this are more and more sought after, since basic explainability is a prerequisite for using machine learning in a variety of industries, such as finance or medicine.

Other modifications in this iteration of the ML service include some internal restructuring meant to improve a deployed model’s availability to a Kubernetes cluster as well as a way to catalog models.

A step by step example demonstrating the new features can be found on the Cloudera blog. CML is charged by the hour, starting at $0.68/hr per instance, and is available on CDP for Azure and AWS.