Ray 1.9 makes Train beta, launches Ray Job Submission server

By Julia Schmidt

December 6, 2021

Ray 1.9 makes Train beta, launches Ray Job Submission server

The team behind the Ray framework for building distributed applications has pushed version 1.9 of its project into the open — a step that also marks the beginning of the beta phase for deep learning module Ray Train.

Ray Train was added to the framework to “simplify distributed deep learning” by offering an abstraction layer to distributed machine learning frameworks like PyTorch Distributed, Horovod and TensorFlow. It also comes with support for well-known tools in the discipline, such as Jupyter notebooks, MLflow, and Tensorboard.

With the graduation into beta, Train can work together with Ray Client and Ray Datasets. Other changes made to support users in their training endeavours include a simplification of single-worker training workflows, and the addition of APIs train.torch.prepare_model(...) and train.torch.prepare_data_loader(...). The latter are meant to prepare PyTorch models and DataLoaders for distributed training.

For the 1.9 release, the Ray team looked into ways of submitting and monitoring applications without an active connection. The result is Ray Job Submission server, which — along with CLI and SDK clients — is meant to tackle the problem by offering a way of submitting local applications to a remote cluster and managing them as jobs. The new addition is currently in alpha, so not fit for production, though the documentation promises “mostly stable” APIs.

Users who were missing a remote file storage option for runtime environments from Ray will find corresponding support in the latest release. It also includes garbage collection and logging improvements for runtime_env, and fixes the threaded actor/core worker/named actor race condition issues.

Meanwhile reinforcement learning library RLlib, which comes bundled with the Ray framework, underwent an architecture refactoring. After some modifications, using tf2 together with eager tracing shouldn’t be much slower than working with tf. Since experiments with sparse reward and reward-at-end environments suggest a more stable learning behaviour when the last ts in V-trace calculations isn’t dropped, this can now be forced by setting vtrace_drop_last_ts in the APPO/IMPALA config to false.

RLib also looks to lose the build_trainer and build_(tf_)?policy utility functions in one of the upcoming releases. While this might still take a while, developers currently working with those functions could do worse than familiarising themselves with a newly added proof of concept that uses sub-classing for building custom trainer classes instead. Details on this and added bug fixes can be found in the release notes and the project documentation.

Model serving library Ray Serve learned to save its internal state to an external storage and recover upon failure since the last release, and comes now fitted with replica autoscaling and a native pipeline API for model composition. Hyperparameter tuning library Tune was reworked in areas like experiment analysis, testing, and cloud checkpointing API/SyncConfig, and Ray Datasets started to offer groupby and aggregations.

Node 22 released with experimental support for require targeting ECMAScript modules and more

Visual Studio analysed: will it ever migrate from .NET Framework?

Dapr: not just for Kubernetes or the Microsoft platform, says co-creator Mark Fussell

AWS combines "building block" blueprints with CodeCatalyst for rapid project creation including DevO...

Sourcegraph coding assistant now supports Anthropic Claude 3 – though limited to 7K token input

Supabase moves out of beta, adds supports for Swift, plugs in Oriole storage engine

Go dev survey shows frustration with Python’s dominance of AI

React Native and why Microsoft uses it for its own cross-platform development

AI coding: Hugging Face engineer extols benefits of open source models, but hard questions remain

Meta based its Threads on Instagram to ship it quickly – and now has to unpick it

VS Code updated with test coverage API, fixed floating windows, TypeScript 5.4 and more

Qt 6.7 and Qt Creator 13 released with variable fonts, improved C++20 support, and more

Ray 1.9 makes Train beta, launches Ray Job Submission server

ABOUT US

FOLLOW US