Skip to content
View hedes1992's full-sized avatar

Block or report hedes1992

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

[CVPR 2022 Oral & TPAMI 2024] MixFormer: End-to-End Tracking with Iterative Mixed Attention

Python 478 73 Updated Feb 28, 2024

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Python 46 5 Updated May 24, 2024

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Python 261 12 Updated Jun 13, 2024

verl: Volcano Engine Reinforcement Learning for LLMs

Python 5,502 533 Updated Mar 23, 2025

AllenAI's post-training codebase

Python 2,831 365 Updated Mar 23, 2025

SOTA Re-identification Methods and Toolbox

Python 3,559 846 Updated Jul 30, 2024

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,962 308 Updated Mar 22, 2025
Python 69 2 Updated Nov 24, 2024

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 6,478 555 Updated Mar 23, 2025

A paper list of some recent works about Token Compress for Vit and VLM

379 19 Updated Mar 10, 2025

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Python 259 29 Updated Mar 20, 2025

A jounery to real multimodel R1 ! We are doing on large-scale experiment

Python 281 7 Updated Mar 8, 2025

Witness the aha moment of VLM with less than $3.

Python 3,360 262 Updated Mar 1, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,320 1,433 Updated Mar 10, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,202 146 Updated Mar 20, 2025

Fully open reproduction of DeepSeek-R1

Python 23,184 2,111 Updated Mar 23, 2025

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 3,217 236 Updated Mar 23, 2025

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,315 173 Updated Mar 19, 2025
Python 25 1 Updated Feb 7, 2025

[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale

Python 205 5 Updated Feb 27, 2024

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python 1,310 73 Updated Jan 17, 2024

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,723 104 Updated Aug 29, 2023

万卷1.0多模态语料

556 28 Updated Oct 20, 2023

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …

Python 584 192 Updated Mar 21, 2025

Performance instrumentation and tracing for Android, Linux and Chrome (read-only mirror of https://android.googlesource.com/platform/external/perfetto/)

C++ 3,188 393 Updated Mar 21, 2025

FlagGems is an operator library for large language models implemented in Triton Language.

Python 458 73 Updated Mar 23, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,595 1,683 Updated Feb 26, 2025

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Python 1,123 69 Updated Mar 13, 2025

Efficient Triton Kernels for LLM Training

Python 4,701 283 Updated Mar 23, 2025
Next