EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

Supported models
- Qwen2/Qwen2.5 language models
- Qwen2/Qwen2.5-VL vision language models
- DeepSeek-R1 distill models
Supported algorithms
- GRPO
- others RL algorithms (comming soon)
Supported datasets
- Any text, vision-text dataset in a specific format.

Requirements

Software Requirements

Python 3.9+
transformers>=4.49.0
flash-attn>=2.4.3
vllm>=0.7.3

We provide a Dockerfile to easily build environments.

Hardware Requirements

* estimated

Method	Bits	1.5B	3B	7B
GRPO Full Fine-Tuning	AMP	2*40GB	4*40GB	4*80GB

Note

We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
pip install git+https://github.com/hiyouga/MathRuler.git

GRPO Training

bash examples/run_qwen2_5_vl_7b_geo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint

Note

If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/run_qwen2_5_vl_7b_geo_swanlab.sh.

Custom Dataset

The dataset should strictly follow the example data format.

Text dataset: https://huggingface.co/datasets/hiyouga/math12k
- Required columns: problem, answer
Vision-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
- Required columns: images, problem, answer

TODO

Support PPO, Remax, Reinforce++ and RLOO for VLMs.
Support padding-free training for VLMs.
Support ulysses parallelism for VLMs.
Support more VLM architectures.

Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

Vision language models are not compatible with padding-free training and ulysses parallelism yet.
Vision language models are not compatible with enable_chunked_prefill unless vLLM v1 is supported.

Discussion Group

👋 Join our WeChat group.

Citation

Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}

We recommend to also cite the original work.

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
assets		assets
examples		examples
scripts		scripts
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

Features

Requirements

Software Requirements

Hardware Requirements

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

GRPO Training

Merge Checkpoint in Hugging Face Format

Custom Dataset

TODO

Known bugs

Discussion Group

Citation

About

Contributors 5

Languages

License

hiyouga/EasyR1

Folders and files

Latest commit

History

Repository files navigation

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

Features

Requirements

Software Requirements

Hardware Requirements

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

GRPO Training

Merge Checkpoint in Hugging Face Format

Custom Dataset

TODO

Known bugs

Discussion Group

Citation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 5

Languages