PyTorch hits 1.5 with refined C++ support, flings out new lib to ease serving pains

PyTorch hits 1.5 with refined C++ support, flings out new lib to ease serving pains

Facebook-led deep learning Python package PyTorch is now available in version 1.5, though it’s a completely new tool that gets all the attention with this release: model serving library TorchServe.

The TorchServe repository introduces the Facebook/AWS collaboration as a “flexible and easy to use tool for serving PyTorch models”, which is something very much needed as companies struggle to actually incorporate their machine learning models into a useful context. However, it isn’t the first project on this;  Cortex and Triton, for example, have been around for a while already.

Joe Spisak, PyTorch product manager at Facebook, explained the decision to start something altogether new with a “central need by the community for a canonical PyTorch serving solution. Even as we pointed people to these projects, people kept asking us if there was something that was ‘PyTorch’ branded.” 

“This frequent ask comes from the need for commitment to support and de-risking around things such as model deployment, which are often production needs (and lack of support would mean serious business risk),” he continued. 

According to a PyTorch feature request, the new project is meant to tackle community pain points such as the large amount of knowledge needed to build a web serving component for hosting models or to customise a model server, and no easy way of adding custom pre- and post-processing for currently in service models.

It comes with a low latency prediction API and default handlers for tasks like object detection and text classification, and offers features useful for production scenarios such as RESTful endpoints for application integration, multi-model serving, and versioning for A/B testing.

Asked why PyTorch worked with AWS on this, Spisak wrote: “PyTorch has always been community-driven with code ownership represented by independent developers, large cloud providers and many others. If you look across the AI community, a large number of researchers and enterprise-users are building their models on AWS and looking to productionize their AI on the leading cloud.” 

It also isn’t the first time the two have cooperated, since they for example worked on the open machine learning model representation ONNX together. However, “the expectation [is] that, much like PyTorch, contributors and code ownership will diversify over time as a community forms around it,” Spisak was quick to add.

Those who want to take TorchServe for a quick spin can do so via the official Docker image, however there are also pip and Conda packages available.

As for the PyTorch 1.5 release, C++ devs will be pleased to hear that the C++ frontend API has finally become stable, meaning that it behaves like its Python counterparts. The release also includes an easier way to use multi-dimensional indexing on tensors via the tensor.index({Slice(), 0, “…”, mask}) function, which the PyTorch team hopes to be less error-prone than the original workaround.

Users who always wanted to expose custom C++ classes and its methods to the TorchScript type system and the Python runtime to manipulate C++ objects with either language, can do so now through a still experimental CustomClassHolder API. “Prior to this release one could do this,” Spisak clarifies, “but it required a complex and non-linear path to get there. In this release, we’ve essentially replicated pybind11’s functionality for binding custom types in a way that creates a TorchScript type that can be called from TorchScript code.”

Meanwhile a preview of a new “channels last” memory format is meant to let computer vision aficionados use speedier algorithms and hardware and offer ways to switch between memory layouts. Spisak explains the feature as “an alternative way of ordering NCHW (N = batch, C = channels, H = height, W = Width) tensors in memory to put C, or the channel component, last.” 

This is helpful because it “allows PyTorch tensors to be laid out in memory such that they align with backend libraries like QNNPACK and with hardware like Nvidia’s Tensor Cores. We started with mainly support for common CNNs like ResNets but will expand coverage in subsequent releases to make this a more general feature.” If this is the beginning of more CV extras making their way into the library remains to be seen, but it’s an interesting development for sure.

Other new additions include high level autograd APIs for easy computations of hessians, jacobians and the like, as well as a new package called torch_xla, which uses an alternative compiler to make computations on Google Cloud TPUs and the respective pods faster. Also, support for ONNX has been improved, allowing the export of models larger than 2GB and the use of ten additional operators.

More details about the release and a long list of known issues can be found in the project’s repository.