typo

yandexdataschool · Dec 13, 2024 · c99fa11 · c99fa11
1 parent 4d01824
commit c99fa11
Show file tree

Hide file tree

Showing 8 changed files with 2,329 additions and 0 deletions.
diff --git a/week11_rl/README.md b/week11_rl/README.md
@@ -0,0 +1,28 @@
+## Materials (based on [`practical_rl` course](https://github.com/yandexdataschool/Practical_RL))
+
+* [Slides](https://disk.yandex.ru/i/64ao19xI77rsNw)
+* Video lecture by D. Silver - https://www.youtube.com/watch?v=KHZVXao4qXs
+* Our [lecture](https://yadi.sk/i/I3M09HKQ3GKBiP), [seminar](https://yadi.sk/i/8f9NX_E73GKBkT)
+* Alternative lecture by J. Schulman part 1 - https://www.youtube.com/watch?v=BB-BhTn6DCM
+* Alternative lecture by J. Schulman part 2 - https://www.youtube.com/watch?v=Wnl-Qh2UHGg
+
+## Practice
+
+__Part 1__ - intro to gym(nasium) interface - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_DL/blob/fall23/week10_rl/intro.ipynb)
+
+__part 2__ - implement REINFORCE with a neural network agent - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_DL/blob/fall23/week10_rl/reinforce_pytorch.ipynb)
+
+__Optionally,__ if you want to go full hardcore, you may choose to implement the actor-critic algorithm in [`a2c-optional.ipynb`](./a2c-optional.ipynb).
+
+## More materials
+* A full-term course on reinforcement learning - [practical_rl](https://github.com/yandexdataschool/practical_rl)
+
+* Actually proving the policy gradient for discounted rewards - [article](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)
+* On variance of policy gradient and optimal baselines: [article](https://papers.nips.cc/paper/4264-analysis-and-improvement-of-policy-gradient-estimation.pdf), another [article](https://arxiv.org/pdf/1301.2315.pdf)
+* Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - [article](https://arxiv.org/abs/1506.02438)
+
+* Generalizing log-derivative trick - [url](http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/)
+* Combining policy gradient and q-learning - [arxiv](https://arxiv.org/abs/1611.01626)
+* Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - [pdf](https://www.sdsj.ru/slides/Vetrov.pdf)
+* Adversarial review of policy gradient - [blog](http://www.argmin.net/2018/02/20/reinforce/)
+