Stars
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
Generative Models by Stability AI
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Medium Articles Notebooks and Media Files
Easily compute clip embeddings and build a clip retrieval system with them
Python package to corrupt arbitrary images.
SoftVC VITS Singing Voice Conversion
Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)
A feature-rich command-line audio/video downloader
a machine learning image inpainting task that instinctively removes watermarks from image indistinguishable from the ground truth image
[ICML 2023] Reflected Diffusion Models (https://arxiv.org/abs/2304.04740)
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Finetune ModelScope's Text To Video model using Diffusers 🧨
Large-scale text-video dataset. 10 million captioned short videos.
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators
High-Resolution Image Synthesis with Latent Diffusion Models
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.