Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 6,478 555 Updated Mar 23, 2025

daixiangzi / Awesome-Token-Compress

A paper list of some recent works about Token Compress for Vit and VLM

379 19 Updated Mar 10, 2025

VITA-MLLM / Long-VITA

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Python 259 29 Updated Mar 20, 2025

FanqingM / R1-Multimodal-Journey

A jounery to real multimodel R1 ! We are doing on large-scale experiment

Python 281 7 Updated Mar 8, 2025

Deep-Agent / R1-V

Witness the aha moment of VLM with less than $3.

Python 3,360 262 Updated Mar 1, 2025

Jiayi-Pan / TinyZero

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,320 1,433 Updated Mar 10, 2025

Unakar / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python 2,202 146 Updated Mar 20, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 23,184 2,111 Updated Mar 23, 2025

hkust-nlp / simpleRL-reason

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 3,217 236 Updated Mar 23, 2025

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,315 173 Updated Mar 19, 2025

guoxy25 / Ocean-OCR

Python 25 1 Updated Feb 7, 2025

baaivision / CapsFusion

[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale

Python 205 5 Updated Feb 27, 2024

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python 1,310 73 Updated Jan 17, 2024

ttengwang / Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,723 104 Updated Aug 29, 2023

opendatalab / WanJuan1.0

万卷1.0多模态语料

556 28 Updated Oct 20, 2023

opendatalab / laion5b-downloader

Python 108 10 Updated May 16, 2023

PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …

Python 584 192 Updated Mar 21, 2025

google / perfetto

Performance instrumentation and tracing for Android, Linux and Chrome (read-only mirror of https://android.googlesource.com/platform/external/perfetto/)

C++ 3,188 393 Updated Mar 21, 2025

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Python 458 73 Updated Mar 23, 2025

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,595 1,683 Updated Feb 26, 2025

showlab / ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Python 1,123 69 Updated Mar 13, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 4,701 283 Updated Mar 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hedes1992

Achievements

Achievements

Block or report hedes1992

Starred repositories

MCG-NJU / MixFormer

opendatalab / MLLM-DataEngine

huangb23 / VTimeLLM

volcengine / verl

allenai / open-instruct

JDAI-CV / fast-reid

xlite-dev / CUDA-Learn-Notes

DCDmllm / Momentor

modelscope / ms-swift