-
Notifications
You must be signed in to change notification settings - Fork 644
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4d01824
commit c99fa11
Showing
8 changed files
with
2,329 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
## Materials (based on [`practical_rl` course](https://github.com/yandexdataschool/Practical_RL)) | ||
|
||
* [Slides](https://disk.yandex.ru/i/64ao19xI77rsNw) | ||
* Video lecture by D. Silver - https://www.youtube.com/watch?v=KHZVXao4qXs | ||
* Our [lecture](https://yadi.sk/i/I3M09HKQ3GKBiP), [seminar](https://yadi.sk/i/8f9NX_E73GKBkT) | ||
* Alternative lecture by J. Schulman part 1 - https://www.youtube.com/watch?v=BB-BhTn6DCM | ||
* Alternative lecture by J. Schulman part 2 - https://www.youtube.com/watch?v=Wnl-Qh2UHGg | ||
|
||
## Practice | ||
|
||
__Part 1__ - intro to gym(nasium) interface - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_DL/blob/fall23/week10_rl/intro.ipynb) | ||
|
||
__part 2__ - implement REINFORCE with a neural network agent - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_DL/blob/fall23/week10_rl/reinforce_pytorch.ipynb) | ||
|
||
__Optionally,__ if you want to go full hardcore, you may choose to implement the actor-critic algorithm in [`a2c-optional.ipynb`](./a2c-optional.ipynb). | ||
|
||
## More materials | ||
* A full-term course on reinforcement learning - [practical_rl](https://github.com/yandexdataschool/practical_rl) | ||
|
||
* Actually proving the policy gradient for discounted rewards - [article](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf) | ||
* On variance of policy gradient and optimal baselines: [article](https://papers.nips.cc/paper/4264-analysis-and-improvement-of-policy-gradient-estimation.pdf), another [article](https://arxiv.org/pdf/1301.2315.pdf) | ||
* Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - [article](https://arxiv.org/abs/1506.02438) | ||
|
||
* Generalizing log-derivative trick - [url](http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/) | ||
* Combining policy gradient and q-learning - [arxiv](https://arxiv.org/abs/1611.01626) | ||
* Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - [pdf](https://www.sdsj.ru/slides/Vetrov.pdf) | ||
* Adversarial review of policy gradient - [blog](http://www.argmin.net/2018/02/20/reinforce/) | ||
|
Oops, something went wrong.