Stars
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model…
Solve Visual Understanding with Reinforced VLMs
DiffuEraser is a diffusion model for video inpainting, which performs great content completeness and temporal consistency while maintaining acceptable efficiency.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Collect every awesome work about r1!
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
🥧 Savoury implementation of the QUIC transport protocol and HTTP/3
"VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"
adefossez / demucs
Forked from facebookresearch/demucsCode for the paper Hybrid Spectrogram and Waveform Source Separation
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
A Training-free Iterative Framework for Long Story Visualization
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A pipeline parallel training script for diffusion models.
A PyTorch native library for large model training
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉
[CVPR'25] Official Implementations for Paper - AniDoc: Animation Creation Made Easier
Helpful tools and examples for working with flex-attention