The team behind Intel’s deep learning library oneDNN, also known as Deep Neural Network Library (DNNL) and Intel Math Kernel Library for DNNs, has just completed another release as part of the company’s newly formed oneAPI family.
According to its repo, oneDNN “includes basic building blocks for neural networks optimized for Intel Architecture Processors and Intel Processor Graphics” and should be used together with other deep learning frameworks such as TensorFlow or PyTorch.
The focus of the release has been clearly laid upon performance optimisation for Intel hardware, though users will surely appreciate some newly added capabilities.
In the buildup to oneDNN 1.4, the project team has, for example, been busy getting an additional flavour of recurrent neural networks to properly function. The architecture in question is called long short-term memory projection, or LSTMP for short, which is meant to improve the speed and performance of the regular LSTM approach through an extra projection layer. Meanwhile, normalisation primitives Softmax and LogSoftmax have been extended so that they are also able to digest input of the bfloat16 data type.
The landscape of deep learning libraries is comparatively competitive, and Intel invested some time to get its Developer Guide up to speed. It now contains examples for all primitives supported by oneDNN, which might prove essential in Chipzilla’s quest to prove its tool’s worth to devs with a focus on the company’s hardware.
To make the library more flexible, oneDNN now also comes with a threadpool CPU runtime. The new addition allows machine learning aficionados to use their own threadpool implementation for computations on multiple threads where needed.
Devs wanting to use LSTMP with Intel Processor Graphics still need to exercise patience since the functionality isn’t implemented for them yet. However, they should be able to notice performance improvements when using f32 convolution forward propagation, as well as f32/f16 pooling and batch normalization forward propagation in cases where they are used with NHWC activations. Apart from that, using f32 and f16 binary primitives should work a bit better after jumping onto the new version.
On the processor improvements front, eltwise backpropagation should run more smoothly now. The oneDNN team also sped up inner product computations for bfloat16 on architectures with DL Boost support, and gave int8 GEMM, RNN, inner product, matmul and GEMM-based convolution for systems with Intel SSE4.1 and Intel AVX support some love.