Skip to content

NoneJou072/rl-notebook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

一个简洁的强化学习算法库

介绍每一种强化学习算法的

  • 算法原理 ✔️
  • Pytorch实现 ✔️

单智能体强化学习算法 Policy Based
👉 Sarsa on-policy value-based
👉 Q-Learning off-policy value-based
👉 DQN off-policy value-based
Rainbow-DQN off-policy value-based
👉 REINFORCE on-policy policy-based
👉 actor-critic on-policy policy-based
👉 A2C on-policy Actor-Critic
👉 DDPG off-policy Actor-Critic
👉 HER-DDPG off-policy Actor-Critic
👉 TD3 off-policy Actor-Critic
TRPO on-policy Actor-Critic
👉 PPO-Continuous on-policy Actor-Critic
👉 SAC off-policy Actor-Critic
👉 Relay HER (RHER) off-policy Actor-Critic
👉 Behavior Cloning (BC) off-policy Imitation Learning
👉 Generative Adversarial Imitation Learning (GAIL) on-policy Imitation Learning

运行环境:

python(in Pycharm)- 3.10
gymnasium-0.28.1
numpy-1.24.3
torch-2.1.0

建议使用 Pycharm,vscode或终端启动会有路径问题。建议更新 gymnasiumpytorch 到最新版本。算法原理请参考每个算法文件夹内的 markdown 文件,内部实现参考以算法名称命名的 .py 脚本,运行 train.py 脚本以进行训练。

在部分算法中添加了 tensorboard 模块,启动训练后会在对应算法文件夹内生成 log 文件夹,通过下面的终端命令可以打开网页查看训练日志:

tensorboard --logdir .

References

About

深度强化学习各算法介绍与Pytorch实现

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages