演示RLHF训练方法
运行环境:
torch==1.13.1+cu117
transformers==4.38.2
datasets==2.18.0
accelerate==0.28.0
视频课程:https://www.bilibili.com/video/BV13r42177Hk
一个更轻更快更简单的实现:https://github.com/lansinuote/Simple_RLHF_tiny
手动构建Llama3模型:https://github.com/lansinuote/Simple_RLHF_Llama3