ONNX runtime sneaks into the auditory realm, gets more mobile friendly

AI/ML

By Team Devclass

March 4, 2021

ONNX runtime sneaks into the auditory realm, gets more mobile friendly

Microsoft’s inference and machine learning accelerator ONNX runtime is now available in version 1.7 and promises reduced binary sizes, while also making a foray into audio.

ONNX runtime makes use of the computation graph format described in the open standard for machine learning interoperability ONNX, and looks to reduce training time for large models, improve inference, and facilitate cross-platform deployments. In the last couple of months, the team seems to have been quite focussed on performance, improving quantisation mechanisms such as depthwise conv, QuantizeLinear, and Fusion for Conv, and reducing the memory needed when using the long document transformer’s attention mechanism on CUDA.

There’s now also support for the QuantizeLinear-DequantizeLinear format, and quantisation for Pad, Split and MaxPool for channel last. Changes in the Python optimiser integrated in the project allow the use of fusion on Bayesian Additive Regression Trees to get performance up. And just so you don’t have to take Microsoft’s word for it, ONNX runtime now also includes a CPU profiling tool that lets you get a better idea of how different transformer models are doing.

Since deployment of machine learning models on mobile remains somewhat of a challenge, the ONNX runtime team has added an option to let the operator kernel only support those types that are actually used by a model. This promises a “25-33% reduction in binary size contribution from the kernel implementations”, though the creators also point out that the model used also plays into how much can be gained.

Speaking of gains, researchers trying to make use of machine learning in audio-related use cases could soon get more use out of the project, since it does now come with first iterations of some audio operators. These include Fourier Transforms (DFT, IDFT, STFT), various windowing Functions (Hann, Hamming, Blackman), and a MelWeightMatrix operator. To give them a go the project has to be built with the ms_experimental build flag enabled.

Developers who have been using the ONNX runtime with OpenMP before, need to check they’re downloading the right version of the project, since it is now built without the API by default. Builds including OpenMP can be identified by a corresponding suffix (onnxruntime-openmp, Microsoft.ML.OnnxRuntime.OpenMP) and are available separately on PyPi and Nuget.

The GPU package is currently missing from Nuget due to size restrictions, which Microsoft looks to fix for upcoming releases. Teams interested in ARM32/64 Windows builds can find those in the CPU Nuget and zip packages starting with this release.

In terms of dependencies it’s important to note that Python 3.5 support has been removed in v1.7 of the runtime, though it has learned to work with versions 3.8 and 3.9. Dependencies on gemmlowp, and build configs for MKLML, openblas and jemalloc have been binned as well. Meanwhile the GPU build is now created using CUDA 11, OpenVINO has been updated to v2021.2, TensorRT to 7.2, and DirectML to 1.4.2 which is meant to help with performance and stability.

ONNX runtime sneaks into the auditory realm, gets more mobile friendly

Microsoft SQL Server MCP tool: "leap in data interaction" or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

MCP will be built into Windows to make an 'agentic OS' but security will be a key concern

ABOUT US

FOLLOW US