According to the team behind machine learning on Kubernetes project Kubeflow, the work on version 1.0 is done, taking the tool at least one more step towards production-readiness.
Kubeflow, much like Kubernetes which it abstracts, was developed at Google. Taking close inspiration from the company’s machine learning pipelines, it is meant to “provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures”.
The project runs on Kubernetes, which means admins get access to the container orchestrator’s capabilities they might already be familiar with. Like this, they’re for example able to setup individual namespaces for certain teams, providing a level of isolation and security, as well as a way to control costs by limiting resources. Meanwhile devs and data analysts of different flavours get an easy way to provision the infrastructure needed for their ML projects.
The project was released into the open in December 2017, and quickly gained quite a lot of interest for its promise of help with one of today’s major grievances in machine learning – integrating the actual model creation process into the product development lifecycle. Most tools focus very much on model training without offering ways of integrating the results into an application, making them difficult to use in actual software products.
In contrast to this, Kubeflow advertises that data scientists can “deploy their application just like TensorBoard; the only thing that changes is the Docker image and flags” alleviating the “need [..] to learn new concepts or platforms to deploy their applications, or to deal with ingress, networking certificates”.
This however implies knowledge of Jupyter and TensorFlow (another Google project) to make the most of the tool from a dev’s or data scientist’s perspective. Users integrate their models into Notebooks and launch them via a UI by “choosing one of the pre-built docker images for Jupyter or entering the URL of a custom image”. In a next step computing resources can be attached to the notebook, and if the training process needs to be monitored, there is always Tensorboard.
Since its introduction, Kubeflow accumulated contributors from a variety of companies such as Microsoft, Cisco, IBM, Alibaba Cloud, Uber, AWS, and Bloomberg.
While there’s no downloadable 1.0 release to be found in the repo yet, a group of maintainers took to the project’s blog this week to alert users to the graduation of “a core set of stable applications needed to develop, build, train, and deploy models on Kubernetes efficiently”.
The applications mentioned include the project’s central dashboard, its CLI kfctl, a Jupyter notebook controller and web app, a profile controller and UI for multi-user management, as well as distributed training operators for machine learning libraries TensorFlow and PyTorch.
If this sounds promising to you, be wary, since for example some of the features in KFServing, the component needed to deploy machine learning models, have still some way to go before they can be called finished. The component is however essential in the four step workflow of developing, building, training, and deploying that Kubeflow propagates.
Looking forward, the project team is planning to add operators for other frameworks and get its pipelines, metadata, and hyperparameter tuning units to 1.0 status. From the latter three, which are currently all in beta, pipelines looks like the most pressing to be completed, seeing that without them it can prove quite tricky to build more realistic complex workflows. More information on the project can be found on the Kubeflow blog.