Stars
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
zero-shot voice conversion & singing voice conversion, with real-time support
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Multilingual Voice Understanding Model
Python interface to the WebRTC Voice Activity Detector
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser.
Library for building powerful interactive command line applications in Python
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Faster Whisper transcription with CTranslate2
App showcasing multiple real-time diffusion models pipelines with Diffusers
prompt2model - Generate Deployable Models from Natural Language Instructions
Resample audio in node or browser using a web assembly port of libsamplerate.
リアルタイムボイスチェンジャー Realtime Voice Changer
44100Hz日本語音源に対応した MB-iSTFT-VITS: Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transformです。
The official GitHub page for the survey paper "A Survey of Large Language Models".
The missing star history graph of GitHub repos - https://star-history.com
Easily train a good VC model with voice data <= 10 mins!