NVIDIA has shot out RAPIDS, a suite of GPU-accelerated data analytics and machine learning libraries, into the open source world.
Though the project relies on CUDA for optimisation, developers will be delighted to learn that the functions are accessible via Python interfaces, which might be easier to incorporate for anyone without a background in GPU programming.
Since training models can take quite some time, GPU-acceleration is gaining traction mainly amongst developers working on more complex projects. One reason is the difficulty of accessing the hardware and adjusting code so that it’s digestible for the processing units. The RAPIDS libraries could help with that – if you’re using NVIDIA hardware, that is.
At the moment the suite contains of a Python GPU DataFrame library, a C GPU DataFrame Library, and an alpha version of the RAPIDS Machine Learning Library. The GitHub repository also includes a subrepo of a merge of the GPU Open Analytics Initiative into cuDF.
cuSKL, which is the Machine Learning library, debuts with implementations of the principal component analysis, density-based spatial clustering of applications with noise, and truncated singular value decomposition. Before the final release of version 0.1 NVIDIA plans to also add a kmeans version as well as a Kalman filter implementation.
Example notebooks for the libraries can be found in the Python folder of the repository. According to the documentation, more algorithms and primitives are currently added. In the future additional multi-GPU versions should follow.
Although the focus has been on Python for the first release, NVIDIA knows about the importance of Apache Spark for complex data science projects. So it had also announced a collaboration with Spark on a RAPIDS integration for the analytics engine. The roadmap consists of Spark Streaming to single GPU cuDF, cuML and cuGraph integration, multi-GPU cuDF UDF and native integration in the longer-term.
The RAPIDS code is protected under a Apache License 2.0 and can be found on GitHub.