Highlights
- Pro
Stars
Unified automatic quality assessment for speech, music, and sound.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
A generative world for general-purpose robotics & embodied AI learning.
Turn any common eBook file into an HQ Audiobook with F5-TTS (Easy Install)
first base model for full-duplex conversational audio
AI powered speech denoising and enhancement
GUI for a Vocal Remover that uses Deep Neural Networks.
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Multilingual G2P in 100 languages
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Text to speech alignment using CTC forced alignment