Skip to content

Collection of Python code that solves the Gymnasium Reinforcement Learning environments, along with YouTube tutorials.

Notifications You must be signed in to change notification settings

johnnycode8/gym_solutions

Repository files navigation

Gymnasium (Deep) Reinforcement Learning Tutorials

This repository contains a collection of Python code that solves/trains Reinforcement Learning environments from the Gymnasium Library, formerly OpenAI’s Gym library. Each solution is accompanied by a video tutorial on my YouTube channel, @johnnycode, containing explanations and code walkthroughs. If you find the code and tutorials helpful, please consider supporting my work:

Buy Me A Coffee


Train Atari Games

If you want to jump straight into training AI agents to play Atari games, this tutorial requires no coding and no reinforcement learning experience! We use RL Baselines3 Zoo, a powerful training framework that lets you train and test AI models easily through a command line interface.

Full Guide: Easiest Way to Train AI to Play Atari Games with Deep Reinforcement Learning


If you want to learn Reinforcement Learning:


Installation

The Gymnasium Library is supported on Linux and Mac OS, but not officially on Windows. On Windows, the Box2D package (Bipedal Walker, Car Racing, Lunar Lander) is problematic during installation, you may see errors such as:

  • ERROR: Failed building wheels for box2d-py
  • ERROR: Command swig.exe failed
  • ERROR: Microsoft Visual C++ 14.0 or greater is required.

My Gymnasium on Windows installation guide shows how to resolve these errors and successfully install the complete set of Gymnasium Reinforcement Learning environments:

How to Install Gymnasium on Windows

However, due to the constantly evolving nature of software versions, you might still encounter issues with the above guide. As an alternative, you can install Gymnasium on Linux within Windows, using Windows Subsystem for Linux (WSL). In this guide, I show how to install the Gymnasium Box2D environments (Bipedal Walker, Car Racing, Lunar Lander) onto WSL:

Install Gymnasium Box2D on Windows Subsystem for Linux


Beginner Reinforcement Learning Tutorials

1. Q-Learning on Gymnasium FrozenLake-v1 (8x8 Tiles)

This is the recommended starting point for beginners. This Q-Learning tutorial provides a step-by-step walkthrough of the code to solve the FrozenLake-v1 8x8 map. The Frozen Lake environment is simple and straightforward, allowing us to concentrate on understanding how Q-Learning works. The Epsilon-Greedy algorithm is used for both exploration (choosing random actions) and exploitation (choosing the best actions). Please note that this tutorial does not delve into the theory or math behind Q-Learning; it is purely focused on practical application.

How to Train Gymnasium FrozenLake-v1 with Q-Learning

Code Reference:

Watch Q-Learning Values Change During Training on Gymnasium FrozenLake-v1

This is the FrozenLake-v1 environment "enhanced" to help you better understand Q-Learning. The Q-values are overlaid on top of each cell of the map, allowing you to visually see the Q-values update in real-time during training. The map is enlarged to fill the entire screen, making the overlaid Q-values easier to read. Additionally, shortcut keys are available to speed up or slow down the animation.

See Q-Learning in Realtime on FrozenLake-v1

Code Reference:
  • frozen_lake_enhanced.py This is the FrozenLake-v1 environment overlayed with Q values. You do not need to understand this code, but feel free to check how I modified the environment.
  • frozen_lake_qe.py This file is almost identical to the frozen_lake_q.py file above, except this uses the frozen_lake_enhanced.py environment.

2. Q-Learning on Gymnasium Taxi-v3 (Multiple Objectives)

In the Taxi-V3 environment, the agent (Taxi) learns to pick up passengers and deliver them to their destination. It is very much similar to the Frozen Lake environment, except that the observation space is more complicated.

How to Train Gymnasium Taxi-v3 Q-Learning

Code Reference:

3. Q-Learning on Gymnasium MountainCar-v0 (Continuous Observation Space)

This Q-Learning tutorial solves the MountainCar-v0 environment. It builds upon the code from the Frozen Lake environment. What is interesting about this environment is that the observation space is continuous, whereas the Frozen Lake environment's observation space is discrete. "Discrete" means that the agent, the elf in Frozen Lake, steps from one cell on the grid to the next, so there is a clear distinction that the agent is going from one state to another. "Continuous" means that the agent, the car in Mountain Car, traverses the mountain on a continuous road, with no clear distinction of states.

How to Train Gymnasium MountainCar-v0 with Q-Learning

Code Reference:

4. Q-Learning on Gymnasium CartPole-v1 (Multiple Continuous Observation Spaces)

This Q-Learning tutorial solves the CartPole-v1 environment. It builds upon the code from the Frozen Lake environment. Like Mountain Car, the Cart Pole environment's observation space is also continuous. However, it has a more complicated continuous observation space: the cart's position and velocity and the pole's angle and angular velocity.

How to Train Gymnasium CartPole-v1 with Q-Learning

Code Reference:

5. Q-Learning on Gymnasium Acrobot-v1 (High Dimension Q-Table)

We'll use a 7-dimension Q-Table to solve the Acrobot-v1 environment.

How to Train Gymnasium Acrobot-v1 with Q-Learning

Code Reference:

6. Q-Learning on Gymnasium Pendulum-v1 (Continuous Action and Observation Spaces)

We'll use Q-Learning to solve the Pendulum-v1 environment.

How to Train Gymnasium Pendulum-v1 with Q-Learning

Code Reference:

7. Q-Learning on Gymnasium MountainCarContinuous-v0 (Stuck in Local Optima)

We'll use Q-Learning to solve the MountainCarContinuous-v0 environment.

How to Train Gymnasium MountainCarContinuous-v0 with Q-Learning

Code Reference:



Deep Reinforcement Learning Tutorials

Getting Started with Neural Networks

Before diving into Deep Reinforcement Learning, it would be helpful to have a basic understanding of Neural Networks. This hands-on end-to-end example of how to calculate Loss and Gradient Descent on the smallest network.

Work Thru the Most Basic Neural Network with Simplified Math and Python

Code Reference:

Deep Q-Learning a.k.a Deep Q-Network (DQN) Explained

This Deep Reinforcement Learning tutorial explains how the Deep Q-Learning (DQL) algorithm uses two neural networks: a Policy Deep Q-Network (DQN) and a Target DQN, to train the FrozenLake-v1 4x4 environment. The Frozen Lake environment is very simple and straightforward, allowing us to focus on how DQL works. The Epsilon-Greedy algorithm and the Experience Replay technique are also used as part of DQL to help train the learning agent. The code referenced here is also walked through in the YouTube tutorial. PyTorch is used to build the DQNs.

YouTube Tutorial Content:

  • Quick overview of the Frozen Lake environment.
  • Why use Reinforcement Learning on Frozen Lake, if a simple search algorithm works.
  • Overview of the Epsilon-Greedy algorithm.
  • Compare Q-Learning's Q-Table vs Deep Q-Learning's DQN
  • How the Q-Table learns.
  • How the DQN learns.
  • Overview of Experience Replay.
  • Putting it all together - walkthru of the Deep Q-Learning algorithm.
  • Walkthru of the Deep Q-Learning code for Frozen Lake.
  • Run and demo the training code.

Deep Q-Learning DQL/DQN Explained + Code Walkthru + Demo

Code Reference:
Dependencies:

Implement DQN with PyTorch and Train Flappy Bird

To gain in-depth understanding of the DQN algorithm, try my series on implementing DQN from scratch: [DQN PyTorch Beginner Tutorials] (https://github.com/johnnycode8/dqn_pytorch)


Apply DQN to Gymnasium Mountain Car

We've already solve MountainCar-v0 with Q-Learning (above). For learning purposes, we'll do it again with Deep Q-Learning. Hopefully, it'll give you a better understanding on how to adapt the code for your needs.

How to Traing Gymnasium MountainCar-V0 with Deep Q-Learning

Code Reference:
Dependencies:

Get Started with Convolutional Neural Network (CNN)

In part 1 (above), the Deep Q-Networks (DQN) used were straightforward neural networks with a hidden layer and an output layer. This network architecture works for simple environments. However, for complex environments—such as Atari Pong—where the agent learns from the environment visually, we need to modify our DQNs with convolutional layers. We'll continue the explanation on the very simple FrozenLake-v1 4x4 environment, however, we'll modify the inputs such that they are treated as images.

Deep Q-Learning with Convolutional Neural Networks

Code Reference:
Dependencies:



Stable Baselines3 Tutorials

Stable Baselines3: Get Started Guide | Train Gymnasium MuJoCo Humanoid-v4

Get started with the Stable Baselines3 Reinforcement Learning library by training the Gymnasium MuJoCo Humanoid-v4 environment with the Soft Actor-Critic (SAC) algorithm. The focus is on the usage of the Stable Baselines3 (SB3) library and the use of TensorBoard to monitor training progress. Other algorithms used in the demo include Twin Delayed Deep Deterministic Policy Gradient (TD3) and Advantage Actor Critic (A2C).

How to Train Gymnasium Humanoid-v4 with Stable Baselines3

Code Reference:
Dependency:

Stable Baselines3 - Beginner's Guide to Choosing RL Algorithms for Training

SB3 offers many ready-to-use RL algorithms out of the box, but as beginners, how do we know which algorithms to use? We'll discuss this topic in the video:

Beginners Guide on Choosing Stable Baselines3 Algorithms for Training


Stable Baselines3: Dynamically Load RL Algorithm for Training | Train Gymnasium Pendulum

In part 1, for simplicity, the algorithms (SAC, TD3, 2C) were hardcoded in the code. In part 2, we'll make loading and creating instances of the algorithms dynamic. To test the changes, we'll train Pendulum-v1 using SAC and TD3 simultaneously and monitor the progress thru TensorBoard.

How to Train Gymnasium Pendulum-v1 with Stable Baselines3

Code Reference:

Automatically Stop Training When Best Model is Found in Stable Baselines3

This tutorial walks thru the code that automatically stop training when the best model is found. We'll demonstrate by training the Gymnasium BipedalWalker-v3 using Soft-Actor Critic.

How to Train Gymnasium BipedalWalker-v3 with Stable Baselines3

Code Reference:

(back to top)

About

Collection of Python code that solves the Gymnasium Reinforcement Learning environments, along with YouTube tutorials.

Resources

Stars

Watchers

Forks

Languages