Stars
Text-to-Music Generation with Rectified Flow Transformers
VITS with phoneme-level prosody modeling based on MaskGIT
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
The open source code for SimpleSpeech series
Evaluation Protocol for Large-Scale Zero-Shot TTS Literature
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
lina-speech : linear attention based text-to-speech
A fast speech-to-any translation model that supports simultaneous decoding and offers 28× speedup.
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
✨✨Latest Advances on Multimodal Large Language Models
Inference and training library for high-quality TTS models.
GPT-style network for phonemization with durations of text
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Official implementation of "Separate Anything You Describe"
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
Liujingxiu23 / MP-SENet
Forked from yxlu-0102/MP-SENetMP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
This is a list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio …
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
A simple and open-source analogue of the HeyGen system
text to speech using autoregressive transformer and VITS