Better, faster, stronger: CUDA 11.6 puts finishing touches to 128-bit int support

By Team Devclass

January 19, 2022

Better, faster, stronger: CUDA 11.6 puts finishing touches to 128-bit int support

The toolkit for Nvidia’s parallel computing platform CUDA recently got updated and is now on version 11.6. With performance and programming model enhancements in tow, the new version is hoped to support a wider array of HPC and data science applications.

Highlights of the release include the GSP driver architecture becoming the default driver mode for Nvidia’s more recent Turing and Ampere GPUs, as well as a new application programming interface that allows to disable kernel nodes of an instantiated graph. Disabled nodes act like empty ones would, though the modification only affects future launches of the graph. Node parameters are promised to stay the same while the node is disabled.

Another interesting enhancement comes in the form of new functions for the cooperative groups namespace. The additions include ways to learn about the dimension and number of threads and blocks within a thread block or grid group respectively, and are meant to “improve consistency in naming, function scope, and unit dimension and size”.

While support for 128-bit integers in CUDA C++ already was a part of the last release, v11.6 takes the implementation a step further, and expands the capability to use the data type in compilers and developer tools as well.

CUDA 11.6 should be able to use the latest Visual Studio 2022 as a host compiler, and will automatically prune unused kernels to improve performance. Parallel thread execution (PTX) models come fitted with new instructions for creating bit masks and using sign extension starting with the release.

Developers can also configure device linker nvlink to generate PTX, which should be helpful for scenarios that use optimisation at device link time but require forward compatibility across GPU architectures.

As usual, the CUDA platform update includes some library enhancements, though most seem to be about performance this time around. Notable new features not belonging to that realm are a new API for computing Absolute Manhattan distance transforms in NPP and options to realise fusion in DLtraining in cuBLAS.

Updating installations to version 11.6 should be relatively straightforward, however users should be aware that support for CentOS Linux 8 as well as the cudaDeviceSynchronize() function have been deprecated. According to Nvidia, a better performing replacement programming model for the second is planned to be added soon.

Better, faster, stronger: CUDA 11.6 puts finishing touches to 128-bit int support

Zig lead makes "extremely breaking" change to std.io ahead of Async and Await's return

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Cloudflare container platform in public preview with scale to zero pricing, some initial limitations

Microsoft to finally expunge the Azure AD Graph API

Avalonia UI sponsorship 'completely removes' open source vs commercial conflict claims CEO

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

"Serious" MySQL bug celebrates 20 years unfixed - another reason to switch to PostgreSQL?

React ecosystem is fractured but Vercel is not the villain, argues Redux maintainer

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

CNCF pitched into backup mode as Salesforce pulls free enterprise Slack

Misconfigured GitHub Actions could leave repos and secrets exposed, Sysdig finds

ABOUT US

FOLLOW US