Google AI plants SEED for better scalable reinforcement learning

AI/ML

By Team Devclass

March 24, 2020

Google AI plants SEED for better scalable reinforcement learning

Google AI researchers have looked into ways of making reinforcement learning scale better and improve computational efficiency. The result is called SEED RL and can now be explored via GitHub.

SEED stands for scalable, efficient, deep reinforcement learning and describes a “modern RL agent that scales well, is flexible and efficiently utilises available resources”. In their research paper on the project, Lasse Espeholt and his colleagues cite the possibility of training agents on millions of frames per second and lowering the cost of experiments as the approache’s key benefits, potentially opening RL up to a wider audience.

Reinforcement learning is a very use-case specific approach in which agents learn about their environment through exploration and optimise their actions to get the most rewards.

Since the method however needs quite a lot of data to produce good results, distributed learning in combination with accelerators such as GPUs can be a means to achieve that in a more reasonable manner.

Architectures following a similar approach include distributed agent IMPALA, which, compared to SEED RL, supposedly has a number of drawbacks. It for example keeps sending parameters and intermediate model states between actors and learners, which can quickly turn into a bottleneck. It also sticks to CPUs when applying model knowledge to a problem (inference), which isn’t the most performant option when working with complex models, and, according to Espeholt et al, doesn’t utilise machine resources optimally.

SEED RL solves all this by using a learner to perform neural network inference centrally on GPUs and TPUs, the number of which can be changed depending on need. The system also includes a batching layer to collect data from multiple actors for added efficiency. Since the model parameters and the state are kept local, data transfer is less of an issue, while observations are sent through a low latency network based on gRPC to keep things running smoothly.

The SEED RL implementation is based on the TensorFlow 2 API and can be found on GitHub. It uses policy gradient-based V-trace for predicting action distributions to sample actions from, and Q-learning method R2D2 to select an action based on the predictions.

Though their results have to be taken with a grain of salt, as is advised for all research, first benchmarks promise a significant increase of the number of computable frames per second when compared to IMPALA for cases where accelerators are an option. Costs are also meant to reduce in certain scenarios since inference costs are said to be lower when using SEED as opposed to IMPALA’s CPU heavy approach. More details are available on the Google AI blog.

Google AI plants SEED for better scalable reinforcement learning

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

MCP will be built into Windows to make an 'agentic OS' but security will be a key concern

ABOUT US

FOLLOW US