Stars
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Open source real-time translation app for Android that runs locally
Easily train a good VC model with voice data <= 10 mins!
ClovaCall dataset and Pytorch LAS baseline code (Interspeech 2020)
A public domain single speaker Japanese speech dataset
[CVPR 2024] Official repository for "Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians"
Open-Sora: Democratizing Efficient Video Production for All
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Pushing the Limits of Zero-shot End-to-End Speech Translation
Source code of "Textual Alchemy: CoFormer for Scene Text Understanding", published in WACV 2024
[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画
[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
🔥 [CVPR 2020] STEFANN: Scene Text Editor using Font Adaptive Neural Network (official code).
Let us democratise high-resolution generation! (CVPR 2024)
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
The pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"
One-click Face Swapper and Restoration powered by insightface 🔥
Industry leading face manipulation platform
WavJourney: Compositional Audio Creation with LLMs
🎼 text-to-video system for music visualization
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
faster-whisper livestream translation, OBS noise reduction, dual language subtitles