Skip to content
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.

Commit

Permalink
Add todos
Browse files Browse the repository at this point in the history
  • Loading branch information
samuelfneumann committed Feb 8, 2022
1 parent afa369d commit 06e9a92
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -726,3 +726,7 @@ sequential runs of hyperparameter setting `m` of the `Agent` in the
* pre/post-timestep hook which takes in a timestep
* Then, the agent and environment interfaces should be extended such that they have an `Info` method, which will return info tracked by the agent or environment. The hooks can then use these.
* Add hook wrappers that will call the hook only every N episodes or N steps
* [ ] Discount factor should be returned by the environment on each timestep,
and adapted by the task such that timeouts do not cause episode termination.
That is, if an episode ends by the goal being reached, a discount factor of 0
is returned. Otherwise, a discount factor of γ is returned

0 comments on commit 06e9a92

Please sign in to comment.