ONNX Runtime 1.8 goes big on hardware, small on memory footprint

AI/ML

By Team Devclass

June 3, 2021

ONNX Runtime 1.8 goes big on hardware, small on memory footprint

Microsoft’s ONNX runtime (ORT), an open source machine learning project meant to speed up the inference and training process, has been updated to version 1.8.

The release is especially useful for fans of hardware accelerated training, since it packs a dynamically loadable CUDA execution provider, which should finally allow builds to equally work in CPU and GPU setups.

The provider also offers ways to load multiple CUDA versions at once, so teams can take their time updating older code. (Hint: they also should slow down on tool updates when trying the new GPU features, since the release notes state that the “GPU part of source code is not compatible with Visual Studio 2019 16.10.0 and clang 12.”)

Developers who’ve grown to like distributed training as a sometimes faster and privacy-friendly option to create models should take a look at onnxruntime-training-gpu and onnxruntime-training-rocm. The new packages facilitate using the approach on Nvidia and AMD GPUs, which could help speed up the process even further if the appropriate hardware is available.

PyTorch users meanwhile can try their luck with the new torch-ort package which can be used as a ONNX runtime backend for the Facebook-driven machine learning library. This addition is especially interesting when taking into account Microsoft’s recent doubling down on PyTorch.

Just last week the company announced the PyTorch Enterprise Support Program it dreamed up with Facebook. The initiative was described as a participatory program to improve the library for mission-critical applications, so better inference (as torch-ort promises) fits in very well here.

Performance is also an oft- cited issue when it comes to ML in the enterprise. The current ORT release tries to tackle this through optimised quantisation capabilities and sharing pre-packed weights for shared initialisers, which can also be shared between sessions to save memory.

To make ORT a staple on more platforms, Microsoft put some time into the ONNX runtime web package, which should now know how to handle the ORT model format, WebAssembly and WebGL for CPU and GPU, and the Web Worker based multi-threaded WebAssembly backend. There’s also a new JavaScript API available, though this has been mainly introduced to replace ONNX.js

ONNX runtime 1.8 comes with an experimental library of custom operators to extend the project’s capabilities called onnxruntime-extensions.

Devs should be aware however, that the package is currently mainly made up of operations and tokenizers for string operations, though the open source nature of the project gives hope that this might change soon.

Since the last release, the ORT profiler tool has learned to include information on threadpool usage for prep, wait, and run time for multi-threading. Additional details can be found in the release notes.

ONNX Runtime 1.8 goes big on hardware, small on memory footprint

Docker adds AI agents to Compose along with GPU-powered cloud Offload service

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

ABOUT US

FOLLOW US