CUDA be a contender: Release 11.3 of Nvidia’s GPU developer toolkit is out

Nivida has released CUDA 11.3, the latest release of its developer toolkit for building applications using its GPUs, with a focus on enhancements to the programming model and boosting the performance of CUDA-based applications.

Announcing the release on its developer blog, Nvidia said it has extended several of the CUDA APIs to improve ease-of-use for CUDA graphs and enhanced the stream-ordered memory allocator feature introduced in 11.2.

In particular, CUDA 11.3 introduces new features to improve the flexibility and the experience of using CUDA Graphs, which allows work submission to be defined in terms of operators and the flow of data between them. One of these is stream capture composability, which allows a graph to be created from the application code by capturing the launched work from CUDA streams into a CUDA graph, rather than using APIs to create it from scratch.

User objects is a new feature to assist with the management of dynamic resources in graphs by assisting with reference-counting the resource. This can be otherwise prove challenging with stream capture, as the code responsible for the resource, such as a library, is not the same code managing the graph, such as the application code.

A new graph debug API provides a fast and convenient way to gain high-level understanding of a given graph by creating a comprehensive overview of the entire graph, without the developer having to calling individual API actions to compose the graph.

CUDA 11.3 also adds new APIs to enhance the stream-ordered memory allocator feature. There is a pointer query to obtain the handle to the memory pool for pointers obtained from an async allocator. Device query can be used to check if mempool-based inter-process communication (IPC) is supported for a particular mempool handle type, while query mempool usage statistics provide a way to obtain allocated memory details.

Other enhancements in CUDA 11.3 include formal support for virtual aliasing, a process where an application accesses two different virtual addresses, but they may actually reference the same physical allocation. There is also a new driver and runtime API to query memory addresses for driver API functions.

C++ support enhancements in this release comprise a new version of libcu++ 1.4.1 , plus CUB 1.11.0 and Thrust 1.11.0, which are major releases providing bug fixes and performance enhancements, according to Nvidia. The CUDA 11.3 release of the CUDA C++ compiler toolchain incorporates new features aimed at improving productivity and code performance. This includes a standalone demangler tool that can decode mangled function names to aid source code correlation. Python support is also available as a preview release on GitHub, aligned with the CUDA 11.3 release.

Finally, the Nvidia Nsight toolset adds Nsight VS Code, an extension to Visual Studio Code for CUDA-based applications, while Nsight Systems 2021.2 introduces support for GPU metrics sampling, and Nsight Compute 2021.1 adds features to give developers increased visibility into the dynamic behaviour of workloads and how these are using hardware and software resources.

The release notes are here.