-
New York University
- New York
-
03:24
(UTC -04:00)
Highlights
- Pro
Stars
A latent text-to-image diffusion model
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
High-Resolution Image Synthesis with Latent Diffusion Models
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
A series of large language models trained from scratch by developers @01-ai
Using Low-rank adaptation to quickly fine-tune diffusion models.
Inpaint anything using Segment Anything and inpainting models.
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Segment Anything in Medical Images
Singing Voice Conversion via diffusion model
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Code for Motion Representations for Articulated Animation paper
[ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"
OpenMMLab course index and stuff
Official implementation of SAM-Med2D
Diffusion attentive attribution maps for interpreting Stable Diffusion.
This repo contains the code for 1D tokenizer and generator
Literature survey, paper reviews, experimental setups and a collection of implementations for baselines methods for predictive uncertainty estimation in deep learning models.
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
A prompting enhancement library for transformers-type text embedding systems
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
MindSpore online courses: Step into LLM