Skip to content

Commit

Permalink
Update custom_policy.rst (DLR-RM#711)
Browse files Browse the repository at this point in the history
* Update custom_policy.rst

Added methods forward_actor and forward_critic in CustomNetwork class.

* Update doc

Co-authored-by: Antonin Raffin <[email protected]>
  • Loading branch information
IperGiove and araffin authored Jan 3, 2022
1 parent c895c1d commit d9e198e
Show file tree
Hide file tree
Showing 5 changed files with 12 additions and 4 deletions.
6 changes: 6 additions & 0 deletions docs/guide/custom_policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,12 @@ If your task requires even more granular control over the policy/value architect
If all layers are shared, then ``latent_policy == latent_value``
"""
return self.policy_net(features), self.value_net(features)
def forward_actor(self, features: th.Tensor) -> th.Tensor:
return self.policy_net(features)
def forward_critic(self, features: th.Tensor) -> th.Tensor:
return self.value_net(features)
class CustomActorCriticPolicy(ActorCriticPolicy):
Expand Down
3 changes: 2 additions & 1 deletion docs/guide/migration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,11 @@ Breaking Changes
================


- SB3 requires python 3.6+ (instead of python 3.5+ for SB2)
- SB3 requires python 3.7+ (instead of python 3.5+ for SB2)
- Dropped MPI support
- Dropped layer normalized policies (``MlpLnLstmPolicy``, ``CnnLnLstmPolicy``)
- LSTM policies (```MlpLstmPolicy```, ```CnnLstmPolicy```) are not supported for the time being
(see `PR #53 <https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/53>`_ for a recurrent PPO implementation)
- Dropped parameter noise for DDPG and DQN
- PPO is now closer to the original implementation (no clipping of the value function by default), cf PPO section below
- Orthogonal initialization is only used by A2C/PPO
Expand Down
2 changes: 1 addition & 1 deletion docs/guide/rl_tips.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ Please use the hyperparameters in the `RL zoo <https://github.com/DLR-RM/rl-base
Continuous Actions - Multiprocessed
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Take a look at ``PPO`` or ``A2C``. Again, don't forget to take the hyperparameters from the `RL zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_
Take a look at ``PPO``, ``TRPO`` (available in our :ref:`contrib repo <sb3_contrib>`) or ``A2C``. Again, don't forget to take the hyperparameters from the `RL zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_
for continuous actions problems (cf *Bullet* envs).

.. note::
Expand Down
3 changes: 2 additions & 1 deletion docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Documentation:
- Added a note on ``load`` behavior in the examples (@Demetrio92)
- Updated SB3 Contrib doc
- Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler)
- Fixed custom policy documentation (@IperGiove)

Release 1.3.0 (2021-10-23)
---------------------------
Expand Down Expand Up @@ -859,4 +860,4 @@ And all the contributors:
@ShangqunYu @PierreExeter @JacopoPan @ltbd78 @tom-doerr @Atlis @liusida @09tangriro @amy12xx @juancroldan
@benblack769 @bstee615 @c-rizz @skandermoalla @MihaiAnca13 @davidblom603 @ayeright @cyprienc
@wkirgsn @AechPro @CUN-bjy @batu @IljaAvadiev @timokau @kachayev @cleversonahum
@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92 @thomasgubler
@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92 @thomasgubler @IperGiove
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@
# For render
"opencv-python",
# For atari games,
"atari_py~=0.2.0",
"atari_py==0.2.6",
"pillow",
# Tensorboard support
"tensorboard>=2.2.0",
Expand Down

0 comments on commit d9e198e

Please sign in to comment.