Highlights
- Pro
Stars
Official Code for DragGAN (SIGGRAPH 2023)
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
A minimal and universal controller for FLUX.1.
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
OmniControl: Control Any Joint at Any Time for Human Motion Generation, ICLR 2024
[NeurIPS 2024] Official code for "Splatter a Video: Video Gaussian Representation for Versatile Processing"
[SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Images
CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Simple, unified interface to multiple Generative AI providers
Unofficial Implementation of E-LatentLPIPS(Ensembled-LatentLPIPS) of Diffusion2GAN
PFGuard: A Generative Framework with Privacy and Fairness Safeguards
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
A curated list of image inpainting and video inpainting papers and resources
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
[T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
[WACV 2024] Training-Free Layout Control with Cross-Attention Guidance
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models