A simple framework for experimenting with Reinforcement Learning in Python 2.7.
There are loads of other great libraries out there for RL. The aim of this one is twofold:
- Simplicity.
- Reproducibility of results.
A brief tutorial is available here.
Just requires numpy and matplotlib.
Now includes support for hooking into any of the Open AI Gym environments.
The easiest way to install is with pip. Just run:
pip install simple_rl
Alternatively, you can download simple_rl here.
To run a simple experiment, import the run_agents_on_mdp(agent_list, mdp) method from simple_rl.run_experiments and call it with some agents for a given MDP. For example:
# Imports
from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.tasks import GridWorldMDP
from simple_rl.agents import QLearnerAgent
# Run Experiment
mdp = GridWorldMDP(10, 10, (1, 1), [(10, 10)])
agent = QLearnerAgent(mdp.actions)
run_agents_on_mdp([agent], mdp)
-
(agents): Code for some basic agents (a random actor, Q-learner, [R-Max], Q-learner with a Linear Approximator, etc.).
-
(experiments): Code for an Experiment class to reproduce results.
-
(mdp): Code for a basic MDP and MDPState class. Also contains OO-MDP implementation [Diuk et al. 2008].
-
(tasks): Implementations for a few standard MDPs (grid world, n-chain, and Taxi [Dietterich 2000]). Recently added support for the OpenAI Gym.
-
(utils): Code for charting utilities.
Make an MDP subclass, which needs:
-
A static variable, ACTIONS, which is a list of strings denoting each action.
-
Implement a reward and transition function and pass them to MDP constructor (along with ACTIONS).
-
I also suggest overwriting the "__str__" method of the class, and adding a "__init__.py" file to the directory.
-
Create a State subclass for your MDP. I suggest overwriting the "__hash__", "__eq__", and "__str__" for the class to play along well with the agents.
Make an Agent subclass, which requires:
-
A method, act(self, state, reward), that returns an action.
-
A method, reset(), that puts the agent back to its tabula rasa state.
Let me know if you have any questions or suggestions.
Cheers,
-Dave