Huskarl aims at becoming the TensorFlow for Reinforcement Learning


TensorFlow users interested in Reinforcement Learning (better known as the thing that made AlphaGo win at Go) might want to take a look at #PoweredByTF 2.0 Challenge winner Huskarl. The framework was recently introduced on the popular library’s Medium blog and is meant for easy prototyping with deep-RL algorithms.

According to its creator, software engineer Daniel Salvadori, Huskarl “abstracts away the agent-environment interaction” in a similar way “to how TensorFlow abstracts away the management of computational graphs”. Under the hood it makes use of TensorFlow 2.0, naturally, and the tf.keras API. It is also implemented in a way that facilitates the parallelisation of computation of environment dynamics across CPU cores, to help in scenarios benefitting from multiple sources.

Although the project is still in early stages, it already includes implementations of Deep Q-Learning Network (DQN), Multi-step DQN, Double DQN, Dueling Architecture DQN, Advantage Actor-Critic, Deep Deterministic Policy Gradient (DDPG), and Prioritized Experience Replay. 

It also comes with three tunable agents – DQN, AC2, and DDPG. While the first operates on problems with a discrete action space, DDPG is used for those with continuous action spaces. Both however use prioritised experience replay, a technique that lets learning agents remember and reuse past experiences, by default.

AC2 is a so-called on-policy algorithm, which Huskarl allows to sample experience from multiple environments. This should “decorrelate the data into a more stationary process which aids learning.” Additional algorithms – Proximal Policy Optimization and Curiosity-Driven Exploration to be more precise – are planned to be added later.

Huskarl works with the OpenAI Gym toolkit for developing and comparing RL algorithms, and can be installed from source via the project’s GitHub repository. A packaged version is available at PyPI. Huskarl is protected under the MIT license.