A collection of trained RL agents, with tuned hyperparameters, using Stable Baselines.
We are looking for contributors to complete the collection!
If the trained agent exists, then you can see it in action using:
python enjoy.py --algo algo_name --env env_id
For example, enjoy A2C on Breakout during 5000 timesteps:
python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder trained_agents/ -n 5000
The hyperparameters for each environment are defined in hyperparameters/algo_name.yml
.
If the environment exists in this file, then you can train an agent using:
python train.py --algo algo_name --env env_id
For example (with tensorboard support):
python train.py --algo ppo2 --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/
Train for multiple environments (with one call) and with tensorboard logging:
python train.py --algo a2c --env MountainCar-v0 CartPole-v1 --tensorboard-log /tmp/stable-baselines/
Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps):
python train.py --algo a2c --env BreakoutNoFrameskip-v4 -i trained_agents/a2c/BreakoutNoFrameskip-v4.pkl -n 5000
7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).
RL Algo | BeamRider | Breakout | Enduro | Pong | Qbert | Seaquest | SpaceInvaders |
---|---|---|---|---|---|---|---|
A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
ACER | ✔️ | ✔️ | ✔️ | ✔️ | |||
ACKTR | |||||||
PPO2 | ✔️ | ||||||
DQN | ✔️ | ✔️ | ✔️ |
RL Algo | CartPole-v1 | MountainCar-v0 | Acrobot-v1 | Pendulum-v0 | MountainCarContinuous-v0 |
---|---|---|---|---|---|
A2C | ✔️ | ✔️ | ✔️ | missing | missing |
ACER | ✔️ | ✔️ | ✔️ | N/A | N/A |
ACKTR | ✔️ | ✔️ | ✔️ | N/A | N/A |
PPO2 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DQN | ✔️ | ✔️ | N/A | N/A | |
DDPG | N/A | N/A | N/A | ✔️ | ✔️ |
RL Algo | BipedalWalker-v2 | LunarLander-v2 | LunarLanderContinuous-v2 | BipedalWalkerHardcore-v2 | CarRacing-v0 |
---|---|---|---|---|---|
A2C | missing | ✔️ | missing | missing | missing |
ACER | N/A | ✔️ | N/A | N/A | N/A |
ACKTR | N/A | ✔️ | N/A | N/A | N/A |
PPO2 | ✔️ | ✔️ | ✔️ | missing | missing |
DQN | N/A | ✔️ | N/A | N/A | N/A |
DDPG | N/A | ✔️ |
You can train agents online using colab notebook.
apt-get install swig cmake libopenmpi-dev zlib1g-dev
pip install stable-baselines==2.1.2 box2d box2d-kengz pyyaml
Please see Stable Baselines README for alternatives.
Build docker image (CPU):
docker build . -f docker/Dockerfile.cpu -t rl-baselines-zoo-cpu
GPU:
docker build . -f docker/Dockerfile.gpu -t rl-baselines-zoo
Pull built docker image (CPU):
docker pull araffin/rl-baselines-zoo-cpu
GPU image:
docker pull araffin/rl-baselines-zoo
Run script in the docker image:
./run_docker_cpu.sh python train.py --algo ppo2 --env CartPole-v1
If you trained an agent that is not present in the rl zoo, please submit a Pull Request (containing the hyperparameters and the score too).