Explore the Multimodal “Aha Moment” on 2B Model
-
Updated
Mar 10, 2025 - Python
Explore the Multimodal “Aha Moment” on 2B Model
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
Add a description, image, and links to the r1-zero topic page so that developers can more easily learn about it.
To associate your repository with the r1-zero topic, visit your repo's landing page and select "manage topics."