Red Hat Certifies Linux for NVIDIA AI Boxes

NVIDIA DGX-1

Red Hat cozied up even further with NVIDIA yesterday, certifying its Enterprise Linux platform on the GPU vendor’s DGX-1 machine learning boxes.

The announcement makes it easier for enterprises to manage their machine learning training on their own premises, the Linux vendor said.

Under the deal, existing Red Hat Enterprise Linux subscriptions are eligible for use on DGX-1 systems. It also opens up certified applications developed for Red Hat’s Linux system to DGX-1 users. Red Hat is going beyond certification by optimizing its Linux for DGX-1 using tuned profiles for the NVIDIA platform. This draws on the tuned package that it released in Red Hat Enterprise Linux 6. The company has said in the past that tuned profiles can boost performance in the double-digit percent range.

Red Hat’s hope is that it will become the control layer of choice for companies training their machine learning models on NVIDIA’s workstation and server boxes. This hardware targets companies with enough AI training workload to crunch their models on their own premises rather than using third-party cloud resources to do it.

The relationship also extends to containers. Red Hat has brought Kubernetes capabilities to DGX-1 users in the form of the Red Hat OpenShift Container Platform, using the device plug-ins capability in Kubernetes to support NVIDIA GPUs.

The two companies will also collaborate on further open-source initiatives. NVIDIA offers its own container platform called NVIDIA GPU Cloud (NGC). It includes a catalogue of software containers optimized for deep learning workloads on NVIDIA hardware, covering frameworks including TensorFlow, PyTorch, MXNet and TensorRT. These containers include the NVIDIA Cuda toolkit and its deep learning libraries. They are now available on Red Hat OpenShift, the software company said.

The two companies will also continue to work together on heterogeneous memory management (HMM), a feature that lets devices access and mirror the content of a system’s memory into their own. This improves the performance of applications using GPUs, Red Hat said.

The two companies have long been close, working on a range of technologies ranging from video drivers to Kubernetes. They have been involved in the Kubernetes Resource Management Working Group for two years to help tackle performance-sensitive workloads using the container system.