Skip to content
View kapidien's full-sized avatar

Block or report kapidien

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Explore the Multimodal “Aha Moment” on 2B Model

Python 14 1 Updated Feb 28, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 1,935 280 Updated Feb 28, 2025

(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Python 61 Updated Feb 27, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 42,512 5,193 Updated Feb 28, 2025

R1-Vision: Let's first take a look at the image

Python 28 Updated Feb 16, 2025

R1-onevision, a visual language model capable of deep CoT reasoning.

265 5 Updated Feb 28, 2025

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,543 421 Updated Aug 7, 2024

Frontier Multimodal Foundation Models for Image and Video Understanding

Jupyter Notebook 563 35 Updated Feb 24, 2025

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,087 72 Updated Jan 23, 2025

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,166 91 Updated Dec 12, 2024

Rethinking Step-by-step Visual Reasoning in LLMs

Python 257 16 Updated Jan 24, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

208 9 Updated Mar 1, 2025

Understanding Why and How Instruction Tuning Changes Pre-trained Models

Python 21 3 Updated Mar 18, 2024

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 890 42 Updated Feb 28, 2025

Video-R1: Towards Super Reasoning Ability in Video Understanding MLLMs

Python 69 1 Updated Feb 23, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 13,793 1,507 Updated Feb 23, 2025

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

Python 1,413 154 Updated Feb 23, 2025

[Blog 1] Recording a bug of grpo_trainer in some R1 projects

Python 14 Updated Feb 23, 2025

LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

Python 16 Updated Feb 21, 2025

[ICLR2025] Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data

Python 11 2 Updated Feb 20, 2025

This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"

Python 35 1 Updated Feb 22, 2025

Unified Reinforcement Learning Framework

Python 693 64 Updated Sep 6, 2024

Official Repo for Open-Reasoner-Zero

Python 1,409 60 Updated Feb 25, 2025

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

86 Updated Jan 23, 2025

minimal-cost for training 0.5B R1-Zero

Python 564 73 Updated Feb 26, 2025

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Jupyter Notebook 4 Updated Feb 19, 2025

This repo contains the code for the paper "Intuitive physics understanding emerges fromself-supervised pretraining on natural videos"

Jupyter Notebook 79 3 Updated Feb 17, 2025

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 2,809 270 Updated Feb 27, 2025
JavaScript 18 1 Updated Feb 17, 2025
Python 30 3 Updated Jan 22, 2025
Next