Highlights
- Pro
Stars
A latent text-to-image diffusion model
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
High-Resolution Image Synthesis with Latent Diffusion Models
QLoRA: Efficient Finetuning of Quantized LLMs
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code and models for the DINOv2 self-supervised learning method.
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image genera…
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
李宏毅2021/2022/2023春季机器学习课程课件及作业
Taming Transformers for High-Resolution Image Synthesis
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
The Open Source Memory Layer For Autonomous Agents
pytorch1.0 updated. Support cpu test and demo. (Use detectron2, it's a masterpiece)
[NeurIPS 2021] You Only Look at One Sequence
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
[NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection".
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets
OpenEQA Embodied Question Answering in the Era of Foundation Models