Stars
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Training code for FAcodec presented in NaturalSpeech3
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Text-to-Music Generation with Rectified Flow Transformers
AudioLDM training, finetuning, evaluation and inference.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
An unofficial PyTorch implementation of the audio LM VALL-E
Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.
Singing Voice Synthesis based on VITS, different from VISinger
Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
Conditional Variational Auto-Encoder with Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
A repository for benchmarking neural vocoders by their quality and speed.
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Deep Performer: Score-to-audio music performance synthesis
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Production First and Production Ready End-to-End Speech Recognition Toolkit
C++ library for audio and music analysis, description and synthesis, including Python bindings
Code for "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"