Stars
Align Anything: Training All-modality Model with Feedback
Witness the aha moment of VLM with less than $3.
A fork to add multimodal model training to open-r1
Solve Visual Understanding with Reinforced VLMs
A high-performance LLM inference API and Chat UI that integrates DeepSeek R1's CoT reasoning traces with Anthropic Claude models.
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
Collect every awesome work about r1!
Democratizing Reinforcement Learning for LLMs
An official code for "Endpoints Weight Fusion for Class Incremental Semantic Segmentation"
[MICCAI 2023] Continual Learning for Abdominal Multi-Organ and Tumor Segmentation
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
CSSegmentation: An Open Source Continual Semantic Segmentation Toolbox Based on PyTorch.
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Rethinking Step-by-step Visual Reasoning in LLMs
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning