Skip to content
forked from hiyouga/EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

License

Notifications You must be signed in to change notification settings

mertunsall/EasyR1

This branch is 1 commit ahead of, 41 commits behind hiyouga/EasyR1:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

94e6f3c Â· Mar 11, 2025

History

28 Commits
Feb 25, 2025
Mar 7, 2025
Mar 11, 2025
Feb 23, 2025
Mar 11, 2025
Feb 23, 2025
Feb 27, 2025
Mar 7, 2025
Feb 22, 2025
Feb 23, 2025
Mar 7, 2025
Feb 25, 2025
Mar 4, 2025
Feb 27, 2025

Repository files navigation

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

  • Supported models

    • Llama3/Qwen2/Qwen2.5 language models
    • Qwen2/Qwen2.5-VL vision language models
    • DeepSeek-R1 distill models
  • Supported algorithms

    • GRPO
    • Remax
    • others RL algorithms (comming soon)
  • Supported datasets

  • Supported tricks

    • Padding-free training
    • Resuming from checkpoint
    • Wandb & SwanLab tracking

Requirements

Software Requirements

  • Python 3.9+
  • transformers>=4.49.0
  • flash-attn>=2.4.3
  • vllm>=0.7.3

We provide a Dockerfile to easily build environments.

We recommend using the pre-built docker image in EasyR1.

docker pull hiyouga/verl:ngc-th2.5.1-cu120-vllm0.7.4-hotfix

Hardware Requirements

* estimated

Method Bits 1.5B 3B 7B
GRPO Full Fine-Tuning AMP 2*24GB 4*40GB 8*40GB

Note

At least 2 GPUs are needed to run EasyR1.

We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

image

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .

GRPO Training

bash examples/run_qwen2_5_vl_7b_geo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint

Tip

If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/run_qwen2_5_vl_7b_geo_swanlab.sh.

Custom Dataset

Please refer to the example datasets to prepare your own dataset.

Tip

EasyR1 already supports multi-image dataset.

How to Understand GRPO in EasyR1

image

  • To learn about the GRPO algorithm, you can refer to Hugging Face's blog.
  • Different from TRL's GRPO trainer, our trainer supports mini-batch update as described in the original PPO paper.

Other Baselines

TODO

  • Support PPO, Reinforce++ and RLOO for VLMs.
  • Support ulysses parallelism for VLMs.
  • Support more VLM architectures.

Note

We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using LLaMA-Factory.

Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

  • Vision language models are not compatible with ulysses parallelism yet.

Discussion Group

👋 Join our WeChat group.

Citation

Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}

We recommend to also cite the original work.

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

About

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Other 0.8%