Facebook opens up on deep learning recommendation model for sparse data

Facebook opens up on deep learning recommendation model for sparse data
Facebook, by solomon7 via Shutterstock

Facebook AI researchers has open-sourced a deep learning recommendation model, which they reckon will help data scientists struggling with sparse data sets.

As Facebook’s Maxim Nuamov and Dheevats Mudigere point out, “neural network-based personalization and recommendation models have emerged as an important tool for building recommendation systems in production environments, including here at Facebook.”

The problem is, they continued, “these models differ significantly from other deep learning models because they must be able to work with categorical data, which is used to describe higher-level attributes.” Categorical data refers to variables with a limited, usually fixed, number of possible values, eye colour for example.

So, the team continues, “It can be challenging for a neural network to work efficiently with this kind of sparse data, and the lack of publicly available details of representative models and data sets has slowed the research community’s progress.”

By open-sourcing the model, they are hoping to drive the technology forward, and develop “new and better methods to use deep learning for recommendation and personalization tools (and to improve model efficiency and performance) [which] will lead to new ways to connect people to the content that is most relevant to them.” Which may or may not be a good thing, depending on your point of view.

As the paper by Nuamov and Mudigere et al explains, Facebook’s researchers sought to unite the underlying principles of recommendation systems – typically using some form of filtering – and predictive analytics, which relies on statistical methods to predict the probability of events based on given data.

So, in its DLRM model, the sparse categorical “features are processed using embeddings, while continuous features are processed with a bottom multilayer perceptron (MLP).”

Second-order interactions of different features are “then computed explicitly. Finally, the results are processed with a top MLP and fed into a sigmoid function in order to give a probability of a click.”

The combination means the resulting model “enables it to work efficiently with production-scale data and provide state of the art results.”

The team have provided two versions of the DLRM benchmark code, one using PyTorch and another using Caffe2 operators, along with a variation using Glow C++ operators. They said the code is self-contained and can interface with public data sets, including the Kaggle Display Advertising Challenge Dataset.