Skip to content
View ustcwhy's full-sized avatar
Focusing
Focusing

Block or report ustcwhy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,582 102 Updated Apr 16, 2025

Code for the Molmo Vision-Language Model

Python 370 31 Updated Dec 12, 2024

Mixture-of-Experts for Large Vision-Language Models

Python 2,147 134 Updated Dec 3, 2024

[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Python 181 4 Updated Apr 4, 2025

AllenAI's post-training codebase

Python 2,899 373 Updated Apr 16, 2025
Python 122 6 Updated Feb 15, 2025

Official inference framework for 1-bit LLMs

C++ 13,050 919 Updated Apr 16, 2025

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 626 43 Updated Mar 31, 2025

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 16,680 1,176 Updated Mar 14, 2025

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 865 31 Updated Feb 19, 2025

VMamba: Visual State Space Models,code is based on mamba

Python 2,537 172 Updated Mar 7, 2025

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Python 361 14 Updated Jan 19, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 25 7 Updated Apr 16, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 9,807 686 Updated Apr 10, 2025

Distributed Training Over-The-Internet

900 31 Updated Dec 3, 2024

Beyond Straight-Through

Python 94 4 Updated Apr 29, 2023

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,130 260 Updated Mar 25, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,225 165 Updated Mar 28, 2025

For optimization algorithm research and development.

Python 507 39 Updated Apr 16, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 6,292 617 Updated Apr 16, 2025

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Python 1,053 41 Updated Oct 9, 2024

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,206 97 Updated Mar 4, 2025

AI2-THOR Data Collection Tool Based On Keyboard Interaction

Python 49 10 Updated Jun 21, 2024

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 2,569 323 Updated Mar 23, 2025

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 649 438 Updated Jul 4, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 2 Updated Apr 2, 2022

Code for the Paper M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models.

Python 8 Updated Mar 11, 2025

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,868 511 Updated Sep 25, 2024
Next