Skip to content

Commit

Permalink
typo
Browse files Browse the repository at this point in the history
  • Loading branch information
justheuristic committed Dec 13, 2024
1 parent 4d01824 commit c99fa11
Show file tree
Hide file tree
Showing 8 changed files with 2,329 additions and 0 deletions.
28 changes: 28 additions & 0 deletions week11_rl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## Materials (based on [`practical_rl` course](https://github.com/yandexdataschool/Practical_RL))

* [Slides](https://disk.yandex.ru/i/64ao19xI77rsNw)
* Video lecture by D. Silver - https://www.youtube.com/watch?v=KHZVXao4qXs
* Our [lecture](https://yadi.sk/i/I3M09HKQ3GKBiP), [seminar](https://yadi.sk/i/8f9NX_E73GKBkT)
* Alternative lecture by J. Schulman part 1 - https://www.youtube.com/watch?v=BB-BhTn6DCM
* Alternative lecture by J. Schulman part 2 - https://www.youtube.com/watch?v=Wnl-Qh2UHGg

## Practice

__Part 1__ - intro to gym(nasium) interface - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_DL/blob/fall23/week10_rl/intro.ipynb)

__part 2__ - implement REINFORCE with a neural network agent - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_DL/blob/fall23/week10_rl/reinforce_pytorch.ipynb)

__Optionally,__ if you want to go full hardcore, you may choose to implement the actor-critic algorithm in [`a2c-optional.ipynb`](./a2c-optional.ipynb).

## More materials
* A full-term course on reinforcement learning - [practical_rl](https://github.com/yandexdataschool/practical_rl)

* Actually proving the policy gradient for discounted rewards - [article](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)
* On variance of policy gradient and optimal baselines: [article](https://papers.nips.cc/paper/4264-analysis-and-improvement-of-policy-gradient-estimation.pdf), another [article](https://arxiv.org/pdf/1301.2315.pdf)
* Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - [article](https://arxiv.org/abs/1506.02438)

* Generalizing log-derivative trick - [url](http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/)
* Combining policy gradient and q-learning - [arxiv](https://arxiv.org/abs/1611.01626)
* Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - [pdf](https://www.sdsj.ru/slides/Vetrov.pdf)
* Adversarial review of policy gradient - [blog](http://www.argmin.net/2018/02/20/reinforce/)

Loading

0 comments on commit c99fa11

Please sign in to comment.