Stars
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Fast and differentiable MS-SSIM and SSIM for pytorch.
A Collection of Variational Autoencoders (VAE) in PyTorch.
A Python library for the Docker Engine API
🚀🚀 Revisiting Binary Local Image Description for Resource Limited Devices
Make drawing and labeling bounding boxes easy as cake
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)
MOMENT: A Family of Open Time-series Foundation Models
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Understand Human Behavior to Align True Needs
Figma clone with NextJS 14, TypeScript, Liveblocks, Fabric.js, Tailwind CSS, Shadcn UI.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
Yet another SAM webui + CLIP
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
official repository for the Instance Prototype Contrastive Learning (IPCL)
A natural language interface for computers
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.