Starred repositories
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC…
csukuangfj / vits_chinese
Forked from UEhQZXI/vits_chinesevits chinese, tts chinese, tts mandarin 史上训练最简单,音质最好的语音合成系统
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Text Normalization & Inverse Text Normalization
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
SummerTTS 是一个基于C++的独立编译的中文和英文语音合成项目,可以本地运行不需要网络,而且没有额外的依赖,一键编译完成即可用于中文和英文的语音合成。SummerTTS is a standalone Chinese and English speech synthesis(TTS) project that has almost no dependency and could be…
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
zero-shot voice conversion & singing voice conversion, with real-time support
A curated list of awesome voice conversion, projects and communities.
Let your Claude able to think
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Universal Tensor Operations in Einstein-Inspired Notation for Python.
Faster Whisper transcription with CTranslate2
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.