Add todos

samuelfneumann · Feb 8, 2022 · 06e9a92 · 06e9a92
1 parent afa369d
commit 06e9a92
Showing 1 changed file with 4 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -726,3 +726,7 @@ sequential runs of hyperparameter setting `m` of the `Agent` in the
 	* pre/post-timestep hook which takes in a timestep
 	* Then, the agent and environment interfaces should be extended such that they have an `Info` method, which will return info tracked by the agent or environment. The hooks can then use these.
 	* Add hook wrappers that will call the hook only every N episodes or N steps
+* [ ] Discount factor should be returned by the environment on each timestep,
+  and adapted by the task such that timeouts do not cause episode termination.
+  That is, if an episode ends by the goal being reached, a discount factor of 0
+  is returned. Otherwise, a discount factor of γ is returned