Skip to content
View miandai's full-sized avatar

Block or report miandai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 38,389 5,028 Updated Jan 23, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 9,152 1,216 Updated Jan 22, 2025

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

320 9 Updated Jan 17, 2025

🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.

Python 232 4 Updated Jan 4, 2025

POC Port of the openai-realtime-console to streamlit.

Python 43 5 Updated Oct 8, 2024

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 94 6 Updated Jan 22, 2025
Python 100 18 Updated Jan 22, 2025

Build your own AI friend

C++ 3,814 600 Updated Jan 23, 2025

Keep track of big models in audio domain, including speech, singing, music etc.

468 28 Updated Sep 26, 2024

Local realtime voice AI

Python 2,176 118 Updated Jan 22, 2025

first base model for full-duplex conversational audio

Python 1,688 113 Updated Jan 5, 2025

High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!

Python 322 31 Updated Jan 10, 2025

Awesome speech/audio LLMs, representation learning, and codec models

860 55 Updated Jan 17, 2025

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

151 12 Updated Nov 10, 2024

Realtime Video and Audio Streaming with WebRTC and Gradio

Python 193 26 Updated Jan 15, 2025

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 3,319 424 Updated Nov 27, 2024

实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and …

Python 642 90 Updated Nov 15, 2024

Config files for self-hosting the FoloToy Community Server. Documents: https://docs.folotoy.com

Dockerfile 493 88 Updated Nov 12, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,589 210 Updated Dec 5, 2024

截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 万向滚动截屏 屏幕翻译 Screenshot Offline OCR Search Translate Search for picture Paste the picture on the screen Screen recorder Omnidirectional scrolling screenshot Screen translator

TypeScript 5,218 394 Updated Jan 23, 2025

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 15,376 2,644 Updated Dec 18, 2024
Python 7,189 564 Updated Jan 14, 2025

SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime

Python 80 12 Updated Sep 24, 2024

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and…

Python 265 28 Updated Nov 13, 2024

免费书籍汇总。                                                                                                                                                                                              …

11,118 1,175 Updated Nov 11, 2024

Voice activity detector (VAD) for the browser with a simple API

TypeScript 1,042 164 Updated Jan 19, 2025

Local SRT/LLM/TTS Voicechat

Python 599 65 Updated Oct 12, 2024
Next