Cloudera lets data science teams BYoIDEs and ups security in workbench update


Collaborative data science platform Cloudera Data Science Workbench is offering users ways of integrating their editor of choice in the now available version 1.6.

The stand-out feature of the latest release got a whole blog entry detailing, how users can get their favourite IDEs to work with the offering (with a little admin help). The base image of the workbench comes with Jupyter preconfigured which can be selected from the start session menu, but local tools such as PyCharm or RStudio can be set up just as well.

One of the ways to realise the cooperation between the workbench and an IDE is a new client called cdwctl, which is available for download via the platform’s web UI. It can be used to for example start an SSH-endpoint on a local machine and connect editor and platform afterwards.

Apart from that, the workbench team has expanded the options for distributed machine learning by giving their customers ways of working with frameworks such as H2O, TensorFlowOnSpark, and XGBoost. Teams regularly dealing with workloads that can’t be scheduled on any other hosts due to large resource requests, can now specify auxiliary nodes to help not let those fall by the wayside.


Starting in version 1.6, one instance of the Cloudera manager can be associated with multiple Cloudera Data Science Workbench CSD deployments. To learn more about a deployment’s status, the manager’s CDSW service now provides the commands Status and Validate, which are equivalent to similarly named ones CLI users are already familiar with.

CLI-affine customers with RPM deployments will have to get used to the new cdsw stop and cdsw start commands, that replaced cdsw reset and cdsw init respectively. Running sessions now display a Logs tab in the Workbench. It shows engine logs and ones for Spark (if used), making logging into Cloudera Data Scientist host and the Spark server first redundant. 

To improve the security aspects of the CDSW, the platform can now work with FreeIPA for identity management, and the new access role of Operator has been introduced. The latter are able to start and stop jobs and have view-only access to code, data, and results. Meanwhile site admins got ways of restricting who can create projects or teams and controlling whether consoles can be shared or not.

A complete list of changes is available in the documentation.

- Advertisement -