Stars
CatV2TON is a lightweight DiT-based visual virtual try-on model, capable of supporting try-on for both images and videos.
🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Code and dataset for "Detecting Human Artifacts from Text-to-Image Models"
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
[ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) …
A powerful tool that translates ComfyUI workflows into executable Python code.
StoryMaker: Towards consistent characters in text-to-image generation
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
SkyReels V1: The first and most advanced open-source human-centric video foundation model
AcadHomepage: A Modern and Responsive Academic Personal Homepage
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
Various AI scripts. Mostly Stable Diffusion stuff.
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Official code of SmartEdit [CVPR-2024 Highlight]
A collection of vision foundation models unifying understanding and generation.
DeepFashion2 Dataset https://arxiv.org/pdf/1901.07973.pdf
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Educational implementation of the Discrete Flow Matching paper
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
A minimal and universal controller for FLUX.1.