Stars
The Open-Source Data Annotation Platform
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gra…
Densely Captioned Images (DCI) dataset repository.
🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Qwen / DeepSeek), Knowledge Base (file upload / knowledge managemen…
📚 A collection of papers about Referring Image Segmentation.
[ECCV'24] Official Implementation of SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
A repository for surgical action triplet dataset. Data are videos of laparoscopic cholecystectomy that have been annotated with <instrument, verb, target> labels for every surgical fine-grained act…
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Machine Learning Engineering Open Book
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
Open source codebase powering the HuggingChat app
QA Bot for Hugging Face documentation to accelerate development within the ecosystem.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''