Stars
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Awesome speech/audio LLMs, representation learning, and codec models
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable an…
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge ba…
Zero-Shot Speech Editing and Text-to-Speech in the Wild
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as…
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Inference and training library for high-quality TTS models.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
A generative speech model for daily dialogue.
🚀 一键部署(含离线整合包)!基于 ChatTTS ,支持流式输出、音色抽卡、长音频生成和分角色朗读。简单易用,无需复杂安装。