The team behind PyTorch has opened up a repository of pre-trained machine learning models, in an effort to bring more reproducibility to the discipline – as well as a little user up/down voting.
While organisations fall over themselves to tout their data science credentials, “reproducibility”, one of the cornerstones of science, is sometimes an afterthought. Some would go so far as to argue that the whole data science world is rather muddled, compared to traditional science in general, or even computer science in particular.
As the intro to PyTorch Hub puts it, “…many machine learning publications are either not reproducible or are difficult to reproduce. With the continued growth in the number of research publications, including tens of thousands of papers now hosted on arXiv and submissions to conferences at an all time high, research reproducibility is more important than ever.”
They go on to add that whilst many publications on models include “code as well as trained models” users are often left to work the steps themselves.
The aim with PyTorch Hub then is to offer “a simple API and workflow that provides the basic building blocks for improving machine learning research reproducibility.” It consists of a pre-trained model repository designed with built-in support for notebook environment Colab, and integration with Papers With Code.
Users can explore available models, load them with a single command, and get a grasp of what methods are available for a model. The FAQ says that one of the next steps is to “implement and upvote/downvote system to surface the best models”.
If you want to post a model, you are responsible for hosting the model weights, using your “favourite cloud storage” or GitHub, if it fits the limits. Models trained on private data are not welcome.
Posting should be straightforward, requiring just the inclusion of a simple hubconf.py file, which will enumerate the models to be supported, and the dependencies they require. So far, there are 18 models in the hub.