Facebook fires up PyTorch-BigGraph for handling ridiculously large graphs

AI/ML

By Team Devclass

April 3, 2019

Facebook fires up PyTorch-BigGraph for handling ridiculously large graphs

Facebook’s research team has just released PyTorch-BigGraph (PBG), giving those wondering how to quickly process graph-structured data for machine learning purposes a leg-up…and pushing their TensorFlow competitor in the process.

PBG is an optimised system for graph embeddings, which can be used to create vector representations for graph-structured data, which is mostly easier to work with. Embeddings like this have shown to be useful for recommendation or prediction tasks for example.

Big-Graph is meant to work with exceptionally large and complex graphs spanning billions of nodes and trillions of edges, as is common in Facebook’s graph data for example, but can also be found in other web offerings with networking capabilities such as YouTube or Twitter.

A common problem in this area is scaling the available resources and especially the providing of memory needed to make the most of such amounts of data. PBG is able to partition graphs to train large embeddings on single machines or distributed environments without having to load everything into memory.

It can also make use of multi-threading and batched negative sampling which should increase the efficient use of memory as well and help with speed, which is a second bottleneck often encountered.

According to the team, the quality of the embeddings generated should be similar to other embedding systems, while needing less time in the training stage. Training consists of the ingestion of a list of edges, each of which is described by a source, a target, and a relationship – if available. The output comes in the form of feature vectors for all entities, with adjacent ones placed as close together as possible and pushing the unconnected ones away, which leads to a clustering of sorts.

PyTorch-BigGraph relies on Python and PyTorch, which is maintained by Facebook, as well as a few other libraries and is BSD licensed. To get started, the GitHub repository contains some example scripts and pretrained embeddings. Facebook’s AI team hopes the open sourcing of the system will get other companies to release larger graph data sets and therefore facilitate research in that area.

More information on the model used and the math behind the approach can be found in the associated paper (PDF), which was just introduced at the conference on Systems and Machine Learning SysML in Stanford, California.

Facebook fires up PyTorch-BigGraph for handling ridiculously large graphs

Microsoft SQL Server MCP tool: "leap in data interaction" or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

MCP will be built into Windows to make an 'agentic OS' but security will be a key concern

ABOUT US

FOLLOW US