Easy math: PyTorch 1.9 amps up distributed training and scientific computing

PyTorch 1.9

Version 1.9 of neural network and tensor computation framework PyTorch is now available. The 3400-plus commits in the update are expected to help the project become a more viable option for distributed training and mobile scenarios, improving performance along the way.

With a strong competitor in the form of Google’s TensorFlow, Facebook’s PyTorch team is trying to make big step towards wider applicability by finally graduating important features such as complex autograd and the linear algebra modul torch.linalg into stable. As of v1.9, PyTorch includes acceleration-supporting implementations of the linear algebra functions that can be found in the popular NumPy library, and the capability to calculate complex gradients as well as optimise loss functions.

Researchers looking to share their models should take a look at the still experimental new packaging format torch.package. Resulting archives can be recognised by the .pt extension and include model data like parameters and buffers, along with its code in their simplest form. Code dependencies are said to be automatically found as well, turning the package into a self-contained unit designed to make for easy experiment reproducibility.  

PyTorch worker process runner and coordinator, TorchElastic, got moved into the PyTorch core, confirming the importance of distributed training for the project. Other still in beta improvements in this area include CUDA support in RPC, and the option to reduce the size of per-process optimizer states by combining ZeroRedundancyOptimizer with DistributedDataParallel.

Teams with a focus on machine learning on mobile and edge devices might profit from the newly released Mobile Interpreter. The slimmed down version of the PyTorch runtime is mainly meant to score with a reduced binary size. 

The freezing API has reached stable status in PyTorch 1.9, providing a more reliable way to inline module parameters and attribute values as constants into internal TorchScript representations. This practice was recommended by PyTorch to achieve better performance for model deployment.

PyTorch Profiler has been refined further and in the update version allows GPU profiling and comes with TensorBoard visualisation for a better overview. The associated API has also learned to support Windows and Mac builds, and handle long-running jobs as well as distributed collectives. The inference mode API has reached beta status as well and promises a “significant speed-up for inference workloads while remaining safe and ensuring no incorrect gradients can ever be computed,” Facebook said.

Along with the framework update, the PyTorch team pushed out new versions of libraries including TorchVision, TorchAudio, and TorchText. Computer vision package TorchVision, for example, has become more mobile friendly through the introduction of object detection architecture SSDlite, quantized kernels to reduce memory usage, and preliminary iOS support in the form of pre-compiled binaries for C++ operators. 

It also includes a regular SSD variant which is meant to provide good and quick detection results for low resolution pictures especially, various speed optimisations and the option to decode JPEGs on the GPU. Version 0.9 of TorchAudio also saw the audio library gaining wav2vec2.0 model architectures, which are meant to make running speech recognition easier amongst other things. Other improvements centred on automatic differentiation, resampling, and Windows support.

Language processing lib TorchText has replaced its Vocab class with a new module, which was designed not only to improve look-up time but also supports TorchScript for model creation.