Ray 1.5 looks to simplify data exchanges, refactored architecture keeps jobs going

Ray 1.5

Distributed execution framework Ray 1.5 is ready for downloading, providing devs in the machine learning space with a first look at new data exchange format Ray Datasets and fixes to get rid of high memory usage issues.

Ray was created at the University of California Berkeley’s Artificial Intelligence Research Lab in 2017. It is intended to “enable practitioners to turn a prototype algorithm that runs on a laptop into a high-performance distributed application that runs efficiently on a cluster (or on a single multi-core machine) with relatively few additional lines of code.”

Ray Datasets are a new addition to the project. Their purpose is to provide a standard way of loading and exchanging data in Ray libraries and apps. They consist of a list of Ray object references to blocks, with each block holding a set of items in the Apache Arrow table format — or a Python list if objects aren’t compatible with Arrow. Structuring the sets as reference lists makes for easier exchange, while working with blocks helps to make data ingestion and transformation processes easier to parallelise. 

Datasets include the comparatively basic distributed operations map, filter and repartition, which are eagerly executed. For more advanced operations, the Ray team wants users to cast Datasets into dataframe types providing more features. This is why Datasets are designed with compatibility to various file formats, datasources and distributed frameworks in mind.

While the Datasets are still in the alpha stage of development, the Ray team decided to promote its distributed backend for gradient boosting framework LightGBM to beta. LightGBM-Ray enables machine learning engineers to implement fault tolerant multi-node and multi-GPU training, and integrates with distributed hyperparameter tuning tool Ray Tune.

Tune was lately fitted with new hyperparameter optimisation searchers BlendSearch and CFO. It has also gained capabilities to pass separately evaluated configurations to searchers, read trial results from JSON files, and keep random values constant when doing a grid search.

Looking into the Ray core, the development team was able to lightly refactor the component in order to guarantee Ray jobs are live as long as there’s enough disk space. The Ray Client, meanwhile, has learned to work with major Ray Libraries and returns a context via Client Connect which is said to be useful as a context manager. Client users should receive better error messages and warnings once they’ve updated to version 1.5, and have an easier time with multi-threaded client-side applications.

Developers who wanted to work with Ray’s Autoscaler and the Alibaba Cloud Aliyun can do so, thanks to newly added support for the platform. The Autoscaler team also decided to expose various metrics which can be used in concert with tools like Prometheus to improve the tool’s observability and allow launching in subnets where public IP assignments are off by default.

Other enhancements can be found in Ray’s reinforcement learning library RLLib, which includes a new API for customising offline datasets, and support for adding trainer policies on-the-fly. It also no longer requires the creation of an env on the server side to get env’s spaces, and allows external env PolicyServers to listen on multiple different ports.

Details and information on bug fixes in Ray 1.5 can be found in the openly available Ray repository.