介绍每一种强化学习算法的
- 算法原理 ✔️
- Pytorch实现 ✔️
单智能体强化学习算法 | Policy | Based |
---|---|---|
👉 Sarsa | on-policy | value-based |
👉 Q-Learning | off-policy | value-based |
👉 DQN | off-policy | value-based |
❌ Rainbow-DQN | off-policy | value-based |
👉 REINFORCE | on-policy | policy-based |
👉 actor-critic | on-policy | policy-based |
👉 A2C | on-policy | Actor-Critic |
👉 DDPG | off-policy | Actor-Critic |
👉 HER-DDPG | off-policy | Actor-Critic |
👉 TD3 | off-policy | Actor-Critic |
❌ TRPO | on-policy | Actor-Critic |
👉 PPO-Continuous | on-policy | Actor-Critic |
👉 SAC | off-policy | Actor-Critic |
👉 Relay HER (RHER) | off-policy | Actor-Critic |
👉 Behavior Cloning (BC) | off-policy | Imitation Learning |
👉 Generative Adversarial Imitation Learning (GAIL) | on-policy | Imitation Learning |
运行环境:
python(in Pycharm)- 3.10
gymnasium-0.28.1
numpy-1.24.3
torch-2.1.0
建议使用 Pycharm,vscode或终端启动会有路径问题。建议更新 gymnasium
和 pytorch
到最新版本。算法原理请参考每个算法文件夹内的 markdown 文件,内部实现参考以算法名称命名的 .py
脚本,运行 train.py
脚本以进行训练。
在部分算法中添加了 tensorboard 模块,启动训练后会在对应算法文件夹内生成 log 文件夹,通过下面的终端命令可以打开网页查看训练日志:
tensorboard --logdir .