Reinforcement Learning framework Dopamine opens up to new environments

Reinforcement Learning framework Dopamine opens up to new environments


Dopamine, a framework for experimenting with reinforcement learning (RL), has reached the 2.0 mark, now allowing the use of custom environments – just half a year after its initial launch.

The project is based on popular numerical computation library TensorFlow and stems from a team of researchers at Google, though it isn’t an official product of the company. It was meant for speculative research purposes and focuses on providing only a few heavily tested RL algorithms in an easy to use way.

That is why for the first iteration the framework only included a single-GPU agent with implementations of n-step Bellman updates, prioritized experience replay, distributional reinforcement learning, and the Deep Q-Networks algorithm. According to a paper by members of the DeepMind team, which is also part of the Alphabet family, those approaches belong to the most important components of state-of-the-art reinforcement learning systems.

If you’re new to this sort of thing reinforcement learning is a kind of subsection of Machine Learning that is used for areas such as robotics or autonomous driving. So called agents learn by reacting to an often simulated environment and evaluating the negative as well as positive feedback they are getting (for example if a robot is learning a route, crashing into something counts as negative feedback, which it tries to avoid). Other approaches often start with a predefined, labeled dataset to make a system learn. In RL the agent’s strategy simply is to get as much positive reward as possible.

Since the first Dopamine version only supported the Arcade Learning Environment, an interface of Atari 2600 game environments to evaluate and compare learning approaches, the newly available v2.0 also works with discrete-domain gym environments. This change seems to have been heavily requested and was brought about by generalising the interface with the environment while keeping the core.

Thanks to the reworked API researchers can now use simpler environments like the ones in the heavily referenced OpenAI Gym to test algorithms, which amongst other things reduces the training time to test ideas. If the results go in the right direction, they can still decide to use one of the complex Atari games to test them out.

Two default configurations for two classic environments (CartPole and Acrobot) are already included in the repository. A Jupyter notebook with instructions on training agents on both is also available and a GymPreprocessing class should help anyone new to Dopamine to learn how the framework can be used with other custom environments.

As already mentioned, the API has changed, so users in need of backwards compatibility will have to keep using v1.0, which can be found as a neatly packaged release on GitHub – a luxury those interested in the new version aren’t treated to, since the release site only lists v1.0. They’ll have to make do with cloning the repository or downloading the whole thing as a *.zip-file – so maybe it’s not for those with a longing for stability.