Lists (4)
Sort Name ascending (A-Z)
Starred repositories
Explore the Multimodal “Aha Moment” on 2B Model
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
R1-onevision, a visual language model capable of deep CoT reasoning.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Frontier Multimodal Foundation Models for Image and Video Understanding
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
SALMONN: Speech Audio Language Music Open Neural Network
Rethinking Step-by-step Visual Reasoning in LLMs
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Understanding Why and How Instruction Tuning Changes Pre-trained Models
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Video-R1: Towards Super Reasoning Ability in Video Understanding MLLMs
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
[Blog 1] Recording a bug of grpo_trainer in some R1 projects
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
[ICLR2025] Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
Official Repo for Open-Reasoner-Zero
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
This repo contains the code for the paper "Intuitive physics understanding emerges fromself-supervised pretraining on natural videos"
PyTorch code and models for V-JEPA self-supervised learning from video.