tts
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what you said!
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
An Open-Sourced LLM-empowered Foundation TTS System
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and…
The official Soundwave repository
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation