ONNX Runtime 1.8 goes big on hardware, small on memory footprint

code machine learning

Microsoft’s ONNX runtime (ORT), an open source machine learning project meant to speed up the inference and training process, has been updated to version 1.8.

The release is especially useful for fans of hardware accelerated training, since it packs a dynamically loadable CUDA execution provider, which should finally allow builds to equally work in CPU and GPU setups. 

The provider also offers ways to load multiple CUDA versions at once, so teams can take their time updating older code. (Hint: they also should slow down on tool updates when trying the new GPU features, since the release notes state that the “GPU part of source code is not compatible with Visual Studio 2019 16.10.0 and clang 12.”)

Developers who’ve grown to like distributed training as a sometimes faster and privacy-friendly option to create models should take a look at onnxruntime-training-gpu and onnxruntime-training-rocm. The new packages facilitate using the approach on Nvidia and AMD GPUs, which could help speed up the process even further if the appropriate hardware is available. 

PyTorch users meanwhile can try their luck with the new torch-ort package which can be used as a ONNX runtime backend for the Facebook-driven machine learning library. This addition is especially interesting when taking into account Microsoft’s recent doubling down on PyTorch

Just last week the company announced the PyTorch Enterprise Support Program it dreamed up with Facebook. The initiative was described as a participatory program to improve the library for mission-critical applications, so better inference (as torch-ort promises) fits in very well here.

Performance is also an oft- cited issue when it comes to ML in the enterprise. The current ORT release tries to tackle this through optimised quantisation capabilities and sharing pre-packed weights for shared initialisers, which can also be shared between sessions to save memory.

To make ORT a staple on more platforms, Microsoft put some time into the ONNX runtime web package, which should now know how to handle the ORT model format, WebAssembly and WebGL for CPU and GPU, and the Web Worker based multi-threaded WebAssembly backend. There’s also a new JavaScript API available, though this has been mainly introduced to replace ONNX.js

ONNX runtime 1.8 comes with an experimental library of custom operators to extend the project’s capabilities called onnxruntime-extensions. 

Devs should be aware however, that the package is currently mainly made up of operations and tokenizers for string operations, though the open source nature of the project gives hope that this might change soon.

Since the last release, the ORT profiler tool has learned to include information on threadpool usage for prep, wait, and run time for multi-threading. Additional details can be found in the release notes.