Autobots, roll out: ONNX runtime hits 1.3, previewing Transformer training help

ONNX runtime

Microsoft has pushed out version 1.3 of its ONNX runtime (ORT), following up last week’s ONNX release with some initial support and allowing users to do more machine learning with fewer resources.

Since ORT is advertised as an inference and training accelerator mainly, improvements to performance and resource utilisation were high on the list of priorities for version 1.3. With its release, users are promised lower latencies for things like ONNX model zoo models, Transformer models on CPUs and GPUs, or when working with scikit-learn models for large batch sizes.

To help developers make the most of the resources available to them, ORT 1.3 includes ways to let sessions share a global threadpool, and threadpool abstractions that switch to the implementation that best suits the build settings. For better control, Eigen threadpools now take a cost parameter, and thread counts can be configured by using OpenMP’s environment variables, if a project was built with the multi-processing interface.

Those who welcomed the new operations ONNX 1.7 introduced just last week will surely be interested to know that those are now also available in the ONNX runtime as well. Other aspects the renewed support covers include Opset 12, which should now be usable without bigger complications.

With all of that being said, the ORT team didn’t just improve on the basics, but also added a new preview feature called ONNX Runtime Training to the project. It is meant to accelerate the training of Transformer models, which follow a network architecture that is based on deep learning’s self-attention mechanism. The new experimental API can be used to switch training backends for large scale PyTorch model training, with the ORT team aiming for minimal code changes in order to use their acceleration. 

Other than that Microsoft has been busy refining APIs and packages, which means the Windows Machine Learning APIs and the ONNX runtime with DirectML package are now generally available on NuGet. The tool’s Java API is also out of preview phase, with a Maven package soon to come, while an early version of its JavaScript equivalent just became available to build from the master branch.

Devs using the Python API are also meant to get their execution times down, thanks to a feature that allows them to “setup inputs/outputs on the GPU prior to model execution”. A complete list of changes for ONNX runtime 1.3 can be found in the release notes.