Skip to content
View Shihao-Chen's full-sized avatar

Block or report Shihao-Chen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CoRL 2024

Python 388 49 Updated Oct 29, 2024

Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"

Python 76 1 Updated Mar 21, 2025

DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding

Python 939 38 Updated Jan 21, 2025

Labeling tool with SAM(segment anything model),supports SAM, SAM2, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具

Python 1,482 153 Updated Mar 19, 2025

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Python 155 9 Updated Apr 29, 2024

Quick scripts to calculate CLIP text-image similarity

Python 220 16 Updated Nov 26, 2024

Evaluating text-to-image/video/3D models with VQAScore

Python 266 18 Updated Mar 16, 2025
Python 94 13 Updated Feb 5, 2025

Official Repo for Open-Reasoner-Zero

Python 1,661 78 Updated Mar 5, 2025

Solve Visual Understanding with Reinforced VLMs

Python 4,265 264 Updated Mar 20, 2025

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 629 38 Updated Mar 13, 2025

✨First Open-Source R1-like Video-LLM [2025/02/18]

Python 286 11 Updated Feb 23, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,822 2,205 Updated Feb 1, 2025

A fork to add multimodal model training to open-r1

Python 1,100 59 Updated Feb 8, 2025

Fully open reproduction of DeepSeek-R1

Python 23,163 2,108 Updated Mar 22, 2025

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,789 170 Updated Jan 22, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,169 163 Updated Feb 13, 2025

Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step

Python 119 6 Updated Feb 17, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,588 1,682 Updated Feb 26, 2025
Python 27 1 Updated Jan 9, 2025

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python 981 64 Updated Mar 19, 2025

DVIS: Decoupled Video Instance Segmentation Framework

Python 141 8 Updated Apr 2, 2024

RedisPOI能爬取指定区域的POI,并将其存储于Redis数据库中。RedisPOI还实现了基本的查询检索和性能计算功能。

Python 2 Updated Dec 3, 2024

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

Python 145 11 Updated Dec 14, 2024

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 4,984 443 Updated Jan 22, 2025

[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Python 149 4 Updated Sep 25, 2024

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

Python 1,339 143 Updated Mar 18, 2024

[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.

Python 319 34 Updated Mar 28, 2024
Next