PyTorch team sips Caffe2, serves up production ready machine learning library

PyTorch team sips Caffe2, serves up production ready machine learning library

More than half a year after the PyTorch team announced the v1.0 roadmap for their tensor and neural network library, the project is finally production ready.

This is mainly thanks to the addition of torch.jit, a just-in-time compiler that rewrites PyTorch models at runtime, so that they can be efficiently optimised and run without the help of a Python interpreter, both of which can be necessary for production.

The main reason for the introduction of the JIT was to not add complexity for users and keep existing projects intact. Old code can be made compatible by using either the compiler’s tracing or script mode. While the first records native PyTorch operations along with data dependencies, the script mode compiles a subset of Python annotated into an intermediate representation, which doesn’t use the language.

Once that is done, the code can be optimised, and serialised for subsequent steps. The JIT, for example, offers an option for exporting models so that they can run in a C++-only runtime, which is based on the Caffe2 deep learning framework – something PyTorch’s largest stakeholder Facebook uses for its production purposes.

To make v1.0 even faster, the PyTorch team also re-designed the library for distributed computing, leaving torch.distributed to operate asynchronously for the backends Gloo, NCCL, and MPI, while boosting distributed data parallel performance for hosts with slow network connections. It also supports asynchronicity for distributed collective operations in the torch.distributed package, and features a barrier operation as well as new_group support in the NCCL backend.

Following the footsteps of other machine learning projects, although in a slightly different form, there is now a repository for pre-trained PyTorch models, so that researchers for example are able to share them amongst each other to make sure results are reproducible. Models can be added to a repository by including a hubconf.py file and loaded via the torch.hub.load API.

Speaking of research, although its API is still unstable, PyTorch 1.0 has a C++ frontend to complement the Python one. It is mainly meant for experiments in low latency and high performance applications and already in use at Facebook, as the documentation suggests.

On top of that, PyTorch now comes with a multivariate log-gamma distribution, a Weibull distribution and a negative binomial one, as well as a variety of new operators to help with things like sparse tensors, chained matrix multiplications, and getting information on certain types. A complete list of all the changes included in v1.0 can be found in the release notes on GitHub.

PyTorch is an open source library for Python, which offers ways to quickly compute tensors, a form of multidimensional array, by making use of GPUs. It also provides deep neural networks and includes a backend for distributed training. The starting point of its development at Facebook was another OS machine learning library, Torch, which as of this year is no longer being actively developed. PyTorch’s initial release was in 2016 and it is now used by companies like Salesforce in areas such as natural language processing and computer vision.