Stars
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
An open source implementation of CLIP.
ImageBind One Embedding Space to Bind Them All
A curated list of foundation models for vision and language tasks
Collection of AWESOME vision-language models for vision tasks
🦜🔗 Build context-aware reasoning applications
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
CLIP inference in plain C/C++ with no extra dependencies
Deep homography network with Pytorch
The unofficial implementation of the paper Deep Image Homography Estimation.
Light-weight library to perform homography estimation with RANSAC from point, line or point-line correspondences
pytorch implementation of scene change detection
GPT4V-level open-source multi-modal model based on Llama3-8B
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
a state-of-the-art-level open visual language model | 多模态预训练模型
collection of dataset&paper&code on Vehicle Re-Identification
An object tracking project with YOLOv8 and ByteTrack, speed up by C++ and TensorRT.
support deepsort and bytetrack MOT(Multi-object tracking) using yolov5 with C++
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
4DS BetterDepth Buffer Anytime ChronoDepth Depth Any Video Depth Anything Depth Pro DepthCrafter DINOv2 FutureDepth GenPercept GeoWizard LightedDepth Marigold Metric3D MiDaS MoGe MonST3R NeWCRFs NV…