Stars
基于语言学本体构建,全面覆盖汉语多音字、音变等现象的高效中文TTS数据集。A linguistically grounded and comprehensive Chinese TTS dataset, efficiently covering Chinese polyphonic characters, phonological changes, and more.
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
zero-shot voice conversion & singing voice conversion, with real-time support
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Official PyTorch implementation of BigVGAN (ICLR 2023)
A Conversational Speech Generation Model
No fortress, purely open ground. OpenManus is Coming.
Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!
GPT-4o-level, real-time spoken dialogue system.
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"