Starred repositories
A blazing fast inference solution for text embeddings models
PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC…
Measures Width of Finger to select ring size Using Image Processing and Hand Landmarks
Get the width of fingers according the photo with a hand and a coin besides the hand.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Noise supression using deep filtering
Python text-to-speech library with built-in voice effects and support for multiple TTS engines
Ikaros-521 / AI-Vtuber
Forked from sandboxdream/AI-VtuberAI Vtuber是一个由 【ChatterBot/ChatGPT/claude/langchain/chatglm/text-gen-webui/闻达/千问/kimi/ollama】 驱动的虚拟主播【Live2D/UE/xuniren】,可以在 【Bilibili/抖音/快手/微信视频号/拼多多/斗鱼/YouTube/twitch/TikTok】 直播中与观众实时互动 或 直接在本地进行聊…
Real time interactive streaming digital human
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
实时STT,连接OpenAI接口/智谱AI(流式LLM)和GPT-SOVITS/Edge-TTS,通过网页的方式,进行跨网络的服务调用,实现实时对话的效果
Table Recognition and Content Extraction in PDF Files
OpenCV-Python图像处理教程
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
A Comprehensive Toolkit for High-Quality PDF Content Extraction