Stars
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"
Learning Flow Fields in Attention for Controllable Person Image Generation
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Official implementation of EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[NeurIPS 24] PromptFix: You Prompt and We Fix the Photo
This repo contains the code for 1D tokenizer and generator
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes; NeurIPS 2024; Official code
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
Lightweight Python framework that provides a high-level API for creating and rendering scenes with Blender.
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Official implementations for paper: Anydoor: zero-shot object-level image customization
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Nodes for image juxtaposition for Flux in ComfyUI
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Stable Diffusion web UI
Official repository of the paper 'Structure-Aware Flow Generation for Human Body Reshaping' in CVPR 2022
[AAAI 2024] NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement. Project Website https://mv-lab.github.io/nilut/
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Select a portrait, click to move the head around (please use your own space / GPU!)