This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.
EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.
-
Supported models
- Qwen2/Qwen2.5 language models
- Qwen2/Qwen2.5-VL vision language models
- DeepSeek-R1 distill models
-
Supported algorithms
- GRPO
- others RL algorithms (comming soon)
-
Supported datasets
- Any text, vision-text dataset in a specific format.
- Python 3.9+
- Pytorch 2.4.0+
- Transformers 4.49.0+
- flash-attn
- vLLM 0.7.3+
We provide a Dockerfile to easily build environments.
At least 8*80GB VRAM is needed to train a 7B model. If you have less computation resource, please consider using smaller (1.5B, 3B) models.
Note
We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.
Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps
git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
pip install git+https://github.com/hiyouga/MathRuler.git
bash examples/run_qwen2_5_vl_7b_geo.sh
python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint
Note
If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com
.
The dataset should strictly follow the example data format.
-
Text dataset: https://huggingface.co/datasets/hiyouga/math12k
- Required columns: problem, answer
-
Vision-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
- Required columns: images, problem, answer
- Support PPO, Remax, Reinforce++ and RLOO for VLMs.
- Support padding-free training for VLMs.
- Support ulysses parallelism for VLMs.
- Support more VLM architectures.
These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.
- Vision language models are not compatible with padding-free training and ulysses parallelism yet.
- Vision language models are not compatible with
enable_chunked_prefill
unless vLLM v1 is supported.
👋 Join our WeChat group.
Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang and Yuwen Xiong
We also thank Guangming Sheng and Chi Zhang for helpful discussions.
@misc{zheng2025easyr1,
title = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
author = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Yuwen Xiong},
howpublished = {\url{https://github.com/hiyouga/EasyR1}},
year = {2025}
}
We recommend to also cite the original work.
@article{sheng2024hybridflow,
title = {HybridFlow: A Flexible and Efficient RLHF Framework},
author = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
year = {2024},
journal = {arXiv preprint arXiv: 2409.19256}
}