Skip to content

Commit

Permalink
Update SB3 Contrib doc (ARS) and W&B integration (DLR-RM#726)
Browse files Browse the repository at this point in the history
* Add ARS to SB3 contrib

* Add integration page
  • Loading branch information
araffin authored Jan 18, 2022
1 parent e9a8979 commit cd6e047
Show file tree
Hide file tree
Showing 6 changed files with 60 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ All the following examples can be executed online using Google colab notebooks:

| **Name** | **Recurrent** | `Box` | `Discrete` | `MultiDiscrete` | `MultiBinary` | **Multi Processing** |
| ------------------- | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | --------------------------------- |
| ARS<sup>[1](#f1)</sup> | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
| A2C | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| DDPG | :x: | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| DQN | :x: | :x: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
Expand Down
1 change: 1 addition & 0 deletions docs/guide/algos.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ along with some useful characteristics: support for discrete/continuous actions,
=================== =========== ============ ================= =============== ================
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
=================== =========== ============ ================= =============== ================
ARS [#f1]_ ✔️ ✔️ ❌ ❌ ✔️
A2C ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ❌ ❌ ❌ ✔️
DQN ❌ ✔️ ❌ ❌ ✔️
Expand Down
49 changes: 49 additions & 0 deletions docs/guide/integrations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
.. _integrations:

============
Integrations
============

Weights & Biases
================

Weights & Biases provides a callback for experiment tracking that allows to visualize and share results.

The full documentation is available here: https://docs.wandb.ai/guides/integrations/other/stable-baselines-3

.. code-block:: python
import gym
import wandb
from wandb.integration.sb3 import WandbCallback
from stable_baselines3 import PPO
config = {
"policy_type": "MlpPolicy",
"total_timesteps": 25000,
"env_name": "CartPole-v1",
}
run = wandb.init(
project="sb3",
config=config,
sync_tensorboard=True, # auto-upload sb3's tensorboard metrics
# monitor_gym=True, # auto-upload the videos of agents playing the game
# save_code=True, # optional
)
model = PPO(config["policy_type"], config["env_name"], verbose=1, tensorboard_log=f"runs/{run.id}")
model.learn(
total_timesteps=config["total_timesteps"],
callback=WandbCallback(
model_save_path=f"models/{run.id}",
verbose=2,
),
)
run.finish()
Hugging Face
============

To be added.
3 changes: 2 additions & 1 deletion docs/guide/sb3_contrib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We implement experimental features in a separate contrib repository:
`SB3-Contrib`_

This allows Stable-Baselines3 (SB3) to maintain a stable and compact core, while still
providing the latest features, like Truncated Quantile Critics (TQC), Trust Region Policy Optimization (TRPO) or
providing the latest features, like Truncated Quantile Critics (TQC), Augmented Random Search (ARS), Trust Region Policy Optimization (TRPO) or
Quantile Regression DQN (QR-DQN).

Why create this repository?
Expand Down Expand Up @@ -36,6 +36,7 @@ See documentation for the full list of included features.

**RL Algorithms**:

- `Augmented Random Search (ARS) <https://arxiv.org/abs/1803.07055>`_
- `Quantile Regression DQN (QR-DQN)`_
- `Truncated Quantile Critics (TQC)`_
- `Trust Region Policy Optimization (TRPO) <https://arxiv.org/abs/1502.05477>`_
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Main Features
guide/custom_policy
guide/callbacks
guide/tensorboard
guide/integrations
guide/rl_zoo
guide/sb3_contrib
guide/imitation
Expand Down
6 changes: 6 additions & 0 deletions docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ New Features:
- Added ``skip`` option to ``VecTransposeImage`` to skip transforming the channel order when the heuristic is wrong
- Added ``copy()`` and ``combine()`` methods to ``RunningMeanStd``

SB3-Contrib
^^^^^^^^^^^
- Added Trust Region Policy Optimization (TRPO) (@cyprienc)
- Added Augmented Random Search (ARS) (@sgillen)

Bug Fixes:
^^^^^^^^^^
- Fixed a bug where ``set_env()`` with ``VecNormalize`` would result in an error with off-policy algorithms (thanks @cleversonahum)
Expand Down Expand Up @@ -57,6 +62,7 @@ Documentation:
- Updated SB3 Contrib doc
- Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler)
- Fixed custom policy documentation (@IperGiove)
- Added doc on Weights & Biases integration

Release 1.3.0 (2021-10-23)
---------------------------
Expand Down

0 comments on commit cd6e047

Please sign in to comment.