Update custom_policy.rst (DLR-RM#711)

* Update custom_policy.rst Added methods forward_actor and forward_critic in CustomNetwork class. * Update doc Co-authored-by: Antonin Raffin <[email protected]>
miki-yuasa · Jan 3, 2022 · d9e198e · d9e198e
1 parent c895c1d
commit d9e198e
Show file tree

Hide file tree

Showing 5 changed files with 12 additions and 4 deletions.
diff --git a/docs/guide/custom_policy.rst b/docs/guide/custom_policy.rst
@@ -333,6 +333,12 @@ If your task requires even more granular control over the policy/value architect
               If all layers are shared, then ``latent_policy == latent_value``
           """
           return self.policy_net(features), self.value_net(features)
+          
+      def forward_actor(self, features: th.Tensor) -> th.Tensor:
+          return self.policy_net(features)
+      
+      def forward_critic(self, features: th.Tensor) -> th.Tensor:
+          return self.value_net(features)
 
 
   class CustomActorCriticPolicy(ActorCriticPolicy):

diff --git a/docs/guide/migration.rst b/docs/guide/migration.rst
@@ -44,10 +44,11 @@ Breaking Changes
 ================
 
 
-- SB3 requires python 3.6+ (instead of python 3.5+ for SB2)
+- SB3 requires python 3.7+ (instead of python 3.5+ for SB2)
 - Dropped MPI support
 - Dropped layer normalized policies (``MlpLnLstmPolicy``, ``CnnLnLstmPolicy``)
 - LSTM policies (```MlpLstmPolicy```, ```CnnLstmPolicy```) are not supported for the time being
+  (see `PR #53 <https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/53>`_ for a recurrent PPO implementation)
 - Dropped parameter noise for DDPG and DQN
 - PPO is now closer to the original implementation (no clipping of the value function by default), cf PPO section below
 - Orthogonal initialization is only used by A2C/PPO

diff --git a/docs/guide/rl_tips.rst b/docs/guide/rl_tips.rst
@@ -147,7 +147,7 @@ Please use the hyperparameters in the `RL zoo <https://github.com/DLR-RM/rl-base
 Continuous Actions - Multiprocessed
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Take a look at ``PPO`` or ``A2C``. Again, don't forget to take the hyperparameters from the `RL zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_
+Take a look at ``PPO``, ``TRPO`` (available in our :ref:`contrib repo <sb3_contrib>`) or ``A2C``. Again, don't forget to take the hyperparameters from the `RL zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_
 for continuous actions problems (cf *Bullet* envs).
 
 .. note::

diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -55,6 +55,7 @@ Documentation:
 - Added a note on ``load`` behavior in the examples (@Demetrio92)
 - Updated SB3 Contrib doc
 - Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler)
+- Fixed custom policy documentation (@IperGiove)
 
 Release 1.3.0 (2021-10-23)
 ---------------------------
@@ -859,4 +860,4 @@ And all the contributors:
 @ShangqunYu @PierreExeter @JacopoPan @ltbd78 @tom-doerr @Atlis @liusida @09tangriro @amy12xx @juancroldan
 @benblack769 @bstee615 @c-rizz @skandermoalla @MihaiAnca13 @davidblom603 @ayeright @cyprienc
 @wkirgsn @AechPro @CUN-bjy @batu @IljaAvadiev @timokau @kachayev @cleversonahum
-@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92 @thomasgubler
+@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92 @thomasgubler @IperGiove
diff --git a/setup.py b/setup.py
@@ -116,7 +116,7 @@
             # For render
             "opencv-python",
             # For atari games,
-            "atari_py~=0.2.0",
+            "atari_py==0.2.6",
             "pillow",
             # Tensorboard support
             "tensorboard>=2.2.0",