Lists (2)
Sort Name ascending (A-Z)
Stars
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Model Context Protocol Servers
A course on aligning smol models.
Solve Visual Understanding with Reinforced VLMs
An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…
Awesome Reasoning LLM Tutorial/Survey/Guide
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
[KDD 2024] Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs
Scalable RL solution for advanced reasoning of language models
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Witness the aha moment of VLM with less than $3.
Fully open reproduction of DeepSeek-R1
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
An open source code repository of driving world models, with training, inferencing, evaluation tools, and pretrained checkpoints.
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
A self-learning tutorail for CUDA High Performance Programing.
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.