Skip to content

Commit

Permalink
Stable-Baselines3 v1.0 (DLR-RM#354)
Browse files Browse the repository at this point in the history
* Bump version and update doc

* Fix name

* Apply suggestions from code review

Co-authored-by: Adam Gleave <[email protected]>

* Update docs/index.rst

Co-authored-by: Adam Gleave <[email protected]>

* Update wording for RL zoo

Co-authored-by: Adam Gleave <[email protected]>
  • Loading branch information
araffin and AdamGleave authored Mar 17, 2021
1 parent 237223f commit e3875b5
Show file tree
Hide file tree
Showing 11 changed files with 75 additions and 17 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ you can take a look at the issues [#48](https://github.com/DLR-RM/stable-baselin
| Type hints | :heavy_check_mark: |


### Planned features (v1.1+)
### Planned features

Please take a look at the [Roadmap](https://github.com/DLR-RM/stable-baselines3/issues/1) and [Milestones](https://github.com/DLR-RM/stable-baselines3/milestones).

Expand All @@ -48,11 +48,13 @@ A migration guide from SB2 to SB3 can be found in the [documentation](https://st

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

## RL Baselines3 Zoo: A Collection of Trained RL Agents
## RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents

[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo). is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.
[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL).

It also provides basic scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.

Goals of this repository:

Expand Down Expand Up @@ -110,9 +112,9 @@ import gym

from stable_baselines3 import PPO

env = gym.make('CartPole-v1')
env = gym.make("CartPole-v1")

model = PPO('MlpPolicy', env, verbose=1)
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)

obs = env.reset()
Expand Down
Binary file added docs/_static/img/net_arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/sb3_loop.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/sb3_policy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 42 additions & 2 deletions docs/guide/custom_policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,49 @@ and other type of input features (MlpPolicies).
which handles bounds more correctly.


SB3 Policy
^^^^^^^^^^

Custom Policy Architecture
^^^^^^^^^^^^^^^^^^^^^^^^^^
SB3 networks are separated into two mains parts (see figure below):

- A features extractor (usually shared between actor and critic when applicable, to save computation)
whose role is to extract features (i.e. convert to a feature vector) from high-dimensional observations, for instance, a CNN that extracts features from images.
This is the ``features_extractor_class`` parameter. You can change the default parameters of that features extractor
by passing a ``features_extractor_kwargs`` parameter.

- A (fully-connected) network that maps the features to actions/value. Its architecture is controlled by the ``net_arch`` parameter.


.. note::

All observations are first pre-processed (e.g. images are normalized, discrete obs are converted to one-hot vectors, ...) before being fed to the features extractor.
In the case of vector observations, the features extractor is just a ``Flatten`` layer.


.. image:: ../_static/img/net_arch.png


SB3 policies are usually composed of several networks (actor/critic networks + target networks when applicable) together
with the associated optimizers.

Each of these network have a features extractor followed by a fully-connected network.

.. note::

When we refer to "policy" in Stable-Baselines3, this is usually an abuse of language compared to RL terminology.
In SB3, "policy" refers to the class that handles all the networks useful for training,
so not only the network used to predict actions (the "learned controller").



.. image:: ../_static/img/sb3_policy.png


.. .. figure:: https://cdn-images-1.medium.com/max/960/1*h4WTQNVIsvMXJTCpXm_TAw.gif
Custom Network Architecture
^^^^^^^^^^^^^^^^^^^^^^^^^^^

One way of customising the policy network architecture is to pass arguments when creating the model,
using ``policy_kwargs`` parameter:
Expand Down
3 changes: 3 additions & 0 deletions docs/guide/developer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ Each algorithm has two main methods:
- ``.train()`` which updates the parameters using samples from the buffer


.. image:: ../_static/img/sb3_loop.png


Where to start?
===============

Expand Down
2 changes: 1 addition & 1 deletion docs/guide/migration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ Base-class (all algorithms)
Policies
^^^^^^^^

- ``cnn_extractor`` -> ``feature_extractor``, as ``feature_extractor`` in now used with ``MlpPolicy`` too
- ``cnn_extractor`` -> ``features_extractor``, as ``features_extractor`` in now used with ``MlpPolicy`` too

A2C
^^^
Expand Down
8 changes: 5 additions & 3 deletions docs/guide/rl_zoo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@
RL Baselines3 Zoo
==================

`RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_. is a collection of pre-trained Reinforcement Learning agents using
Stable-Baselines3.
It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.
`RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_ is a training framework for Reinforcement Learning (RL).

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.

Goals of this repository:

Expand Down
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ It is the next major version of `Stable Baselines <https://github.com/hill-a/sta

Github repository: https://github.com/DLR-RM/stable-baselines3

RL Baselines3 Zoo (collection of pre-trained agents): https://github.com/DLR-RM/rl-baselines3-zoo
RL Baselines3 Zoo (training framework for SB3): https://github.com/DLR-RM/rl-baselines3-zoo

RL Baselines3 Zoo also offers a simple interface to train, evaluate agents and do hyperparameter tuning.
RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

SB3 Contrib (experimental RL code, latest algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Expand Down
15 changes: 13 additions & 2 deletions docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,22 @@
Changelog
==========

Release 1.0rc2 (WIP)
Release 1.0 (2021-03-15)
-------------------------------

**First Major Version**

Breaking Changes:
^^^^^^^^^^^^^^^^^
- Removed ``stable_baselines3.common.cmd_util`` (already deprecated), please use ``env_util`` instead

.. warning::

A refactoring of the ``HER`` algorithm is planned together with support for dictionary observations
(see `PR #243 <https://github.com/DLR-RM/stable-baselines3/pull/243>`_ and `#351 <https://github.com/DLR-RM/stable-baselines3/pull/351>`_)
This will be a backward incompatible change (model trained with previous version of ``HER`` won't work with the new version).


New Features:
^^^^^^^^^^^^^
- Added support for ``custom_objects`` when loading models
Expand All @@ -24,7 +33,9 @@ Documentation:
- Added new project using SB3: rl_reach (@PierreExeter)
- Added note about slow-down when switching to PyTorch
- Add a note on continual learning and resetting environment

- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
- Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)
- Updated the custom policy section

Pre-Release 0.11.1 (2021-02-27)
-------------------------------
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines3/version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.0rc2
1.0

0 comments on commit e3875b5

Please sign in to comment.