Shihao-Chen

Follow

Shihao-Chen

Follow

5 followers · 10 following

Stars

mlzxy / devit

CoRL 2024

Python 388 49 Updated Oct 29, 2024

nnnth / UFO

Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"

Python 76 1 Updated Mar 21, 2025

IDEA-Research / DINO-X-API

DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding

Python 939 38 Updated Jan 21, 2025

yatengLG / ISAT_with_segment_anything

Labeling tool with SAM(segment anything model),supports SAM, SAM2, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具

Python 1,482 153 Updated Mar 19, 2025

Yushi-Hu / tifa

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Python 155 9 Updated Apr 29, 2024

Taited / clip-score

Quick scripts to calculate CLIP text-image similarity

Python 220 16 Updated Nov 26, 2024

linzhiqiu / t2v_metrics

Evaluating text-to-image/video/3D models with VQAScore

Python 266 18 Updated Mar 16, 2025

DYEvaLab / EvalMuse

Python 94 13 Updated Feb 5, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 1,661 78 Updated Mar 5, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 4,265 264 Updated Mar 20, 2025

TideDra / lmm-r1

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 629 38 Updated Mar 13, 2025

Wang-Xiaodong1899 / Open-R1-Video

✨First Open-Source R1-like Video-LLM [2025/02/18]

Python 286 11 Updated Feb 23, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,822 2,205 Updated Feb 1, 2025

EvolvingLMMs-Lab / open-r1-multimodal

A fork to add multimodal model training to open-r1

Python 1,100 59 Updated Feb 8, 2025

deepseek-ai / DeepSeek-R1

87,155 11,254 Updated Feb 24, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 23,163 2,108 Updated Mar 22, 2025

InternLM / InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,789 170 Updated Jan 22, 2025

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,169 163 Updated Feb 13, 2025

Ucas-HaoranWei / Slow-Perception

Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step

Python 119 6 Updated Feb 17, 2025

deepseek-ai / DeepSeek-V3

Python 92,828 15,089 Updated Mar 16, 2025

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,588 1,682 Updated Feb 26, 2025

zhouyiks / CoLVA

Python 27 1 Updated Jan 9, 2025

magic-research / Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python 981 64 Updated Mar 19, 2025

zhang-tao-whu / DVIS

DVIS: Decoupled Video Instance Segmentation Framework

Python 141 8 Updated Apr 2, 2024

QuanzhuNiu / RedisPOI

RedisPOI能爬取指定区域的POI，并将其存储于Redis数据库中。RedisPOI还实现了基本的查询检索和性能计算功能。

Python 2 Updated Dec 3, 2024

AnjieCheng / SpatialRGPT

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

Python 145 11 Updated Dec 14, 2024

DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 4,984 443 Updated Jan 22, 2025

TempleX98 / MoVA

[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Python 149 4 Updated Sep 25, 2024

czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

Python 1,339 143 Updated Mar 18, 2024

ByungKwanLee / MoAI

[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.

Python 319 34 Updated Mar 28, 2024