📖Paper | 📊Datasets | 🤗MM-Eureka-8B | 🤗MM-Eureka-Zero-38B
MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
We present MM-Eureka and MM-Eureka-Zero, a series of multimodal reasoning models that successfully extend large-scale rule-based reinforcement learning (RL) to multimodal reasoning.
While rule-based RL has shown remarkable success in improving LLMs' reasoning abilities in text domains, its application to multimodal settings has remained challenging. Our work reproduces key characteristics of text-based RL systems like DeepSeek-R1 in the multimodal space for the first time, including steady increases in accuracy reward and response length, and the emergence of reflection behaviors.
We demonstrate that both instruction-tuned and pre-trained models can develop strong multimodal reasoning capabilities through rule-based RL without supervised fine-tuning, showing superior data efficiency compared to alternative approaches.
🔥We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at MM-EUREKA
- [2025/03/07] We released
MM-Eureka
.- 📖 Paper: MM-Eureka-paper
- 🤗 Model: MM-Eureka-8B & MM-Eureka-Zero-38B
- 📊 Dataset: MM-Eureka-Dataset
This repository is built upon OpenRLHF, introducing several key enhancements:
- Multimodal RFT Support: Extends OpenRLHF to incorporate vision-language models (VLMs), currently supporting InternVL, enabling multimodal reasoning capabilities.
- Currently support RLOO, REINFORCE++, GRPO training using Ray.
- vLLM integration and distributed training.
- Support hybrid engine (
--colocate_all_models
,--vllm_enable_sleep
).
- Better Rule-based Reward support: Better training visualization for Rule-based Rewards (i.g. Format Reward, Accuracy Reward, Repetition Penalty)
- Online Filtering: Filtering out experiences based on Accuracy Reward during training as in PRIME
- Use
--enable_accuracy_filter
,--freezing_filter_steps
,--accuracy_lower_bound
,--accuracy_upper_bound
to control the behavior of online accuracy filter. - Online Accuracy filter is not currently enabled in our default settings, refer to the Disccusion Section in our paper for more details.
- Use
git clone https://github.com/ModalMinds/MM-EUREKA.git
cd MM-EUREKA
pip install -e .[vllm]
# install flash-attn==2.3.6:
pip install flash-attn==2.3.6 --no-build-isolation
# Alternatively you can compile from source:
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.3.6
python setup.py install
You can download our training data from MM-Eureka-Dataset
Once downloaded, refer to the section below for additional data formation. You may need to update the image_urls
field to reference your local image paths for proper processing.
For custom dataset, format your data in to a JSONL file, where each entry is a dictionary organized in the following format.
{
"id": "0",
"conversations": [
{
"role": "system",
"content": "system_prompt"
},
{
"role": "user",
"content": "user_prompt"
}
],
"answer": "gt that could be parsed and verified by math_verify",
"image_urls": ["file:///path/to/image1", "file:///path/to/image2"]
}
Note
For text-only inputs, we follow InternVL's official approach, which requires a dummy image input.
Specifically, you should provide a (224, 224) pure white image as a placeholder.
We have already provided such a blank image at: examples/blank.png
Before starting your own training, ensure that the paths in the provided training scripts are correctly set and that environment variables like $MASTER_ADDR
and $NODE_RANK
are properly configured.
start MM-Eureka-8B training
-
for single node
sh examples/scripts/train_mm_eureka_8b_single_node.sh
-
for multiple node
sh examples/scripts/train_mm_eureka_8b_multi_node.sh
start MM-Eureka-Zero-38B training
sh examples/scripts/train_mm_eureka_zero_38b_multi_node.sh
We provide our evaluation code in the eval/
directory. To customize the evaluation process for different models, use the --prompt_template
argument to specify the appropriate prompt format.
We also introduce the K12 Math Dataset, a curated set of 500 fill-in-the-blank mathematics questions covering concepts from middle to high school levels. The dataset can be found in the eval/k12
directory
MM-Eureka is stil under active development, if you want to contribute, please feel free to make a pull request or create an issue.
Please refer to CONTRIBUTING.md
before you dive in!
If you have any questions or would like to engage with our community, feel free to scan the QR code below to join our WeChat group.
We acknowledge the outstanding open-source contributions from OpenRLHF, LMM-R1 and vLLM. We also extend our gratitude to DeepSeek-R1 and InternVL for their open-source techniques and base models, which have enabled us to further our exploration.
@misc{MM-EUREKA2025,
title={MM-EUREKA: Exploring Visual Aha Moment with Rule-Based Large-Scale Reinforcement Learning},
author={Fanqing Meng and Lingxiao Du and Zongkai Liu and Zhixiang Zhou and Quanfeng Lu and Daocheng Fu and Botian Shi and Wenhai Wang and Junjun He and Kaipeng Zhang and Ping Luo and Yu Qiao and Qiaosheng Zhang and Wenqi Shao},
year={2025},
howpublished={\url{https://github.com/ModalMinds/MM-EUREKA}},
}