Stars
Python sample codes and textbook for robotics algorithms.
Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)
This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive A…
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
DAMO-YOLO: a fast and accurate object detection method with some new techs, including NAS backbones, efficient RepGFPN, ZeroHead, AlignedOTA, and distillation enhancement.
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[ECCV 2024] Official implementation of "LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Famous Vision Language Models and Their Architectures
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
SSSegmentation: An Open Source Supervised Semantic Segmentation Toolbox Based on PyTorch.
(TPAMI 2024) A Survey on Open Vocabulary Learning
Awesome-LLM: a curated list of Large Language Model
Mixture-of-Experts for Large Vision-Language Models
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
[ECCV 2024] Tokenize Anything via Prompting
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
An open source implementation of CLIP.
Efficient vision foundation models for high-resolution generation and perception.
A simple python implemented frame-by-frame visual odometry with SuperPoint feature detector and SuperGlue feature matcher.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
Strong and Open Vision Language Assistant for Mobile Devices
Open-source and strong foundation image recognition models.
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
[CVPR 2024] Real-Time Open-Vocabulary Object Detection