Stars
Open-source and strong foundation image recognition models.
Witness the aha moment of VLM with less than $3.
Awesome Data-Driven Autonomous Driving Solutions. Also the official repository of our survey paper: Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Min…
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
Research Software Engineering with Python course material
Software Design by Example: a tool-based introduction with Python
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"
Repository of finetuning Open-Source VLM
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…
An official repo for WACV 2025 paper "LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations"
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…
(TPAMI 2024) A Survey on Open Vocabulary Learning
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
[ICCV'2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation
ROOT: VLM based System for Indoor Scene Understanding and Beyond
This is the code of ECCV 2022 (Oral) paper "Fine-Grained Scene Graph Generation with Data Transfer".
[WACV 2025] This is the official implementation of the paper "Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge" in PyTorch.
[CVPR 2024] Code for HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation