-
alibaba
- beijing
- http://www.cnblogs.com/NaughtyBaby/
Starred repositories
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.
POC Port of the openai-realtime-console to streamlit.
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
Keep track of big models in audio domain, including speech, singing, music etc.
first base model for full-duplex conversational audio
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Awesome speech/audio LLMs, representation learning, and codec models
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
Realtime Video and Audio Streaming with WebRTC and Gradio
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and …
Config files for self-hosting the FoloToy Community Server. Documents: https://docs.folotoy.com
截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 万向滚动截屏 屏幕翻译 Screenshot Offline OCR Search Translate Search for picture Paste the picture on the screen Screen recorder Omnidirectional scrolling screenshot Screen translator
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime
This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and…
Voice activity detector (VAD) for the browser with a simple API