Stars
OCR, layout analysis, reading order, table recognition in 90+ languages
Convert PDF to markdown + JSON quickly with high accuracy
SpatialLM: Large Language Model for Spatial Understanding
Toolkit for linearizing PDFs for LLM datasets/training
一个使用Flutter开发,支持诸多云平台AI大模型API调用的智能工作生活助手应用。除了常规大模型应用,还有极简记账、随机菜品、猫狗之家、waifu图片、MAL动漫排行、BGM动漫资讯、饮食健康等生活日常工具。(持续更新中……)
The sample app showcasing Tencent Cloud Chat integration with Flutter across iOS, Android, Web, macOS, and Windows platforms.
The reinforcement learning training code for AgiBot X1.
基于大模型搭建的聊天机器人,同时支持 微信公众号、企业微信应用、飞书、钉钉 等接入,可选择GPT3.5/GPT-4o/GPT-o1/ DeepSeek/Claude/文心一言/讯飞星火/通义千问/ Gemini/GLM-4/Claude/Kimi/LinkAI,能处理文本、语音和图片,访问操作系统和互联网,支持基于自有知识库进行定制企业智能客服。
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
The official Python API for ElevenLabs Text to Speech.
Olares: An Open-Source Sovereign Cloud OS for Local AI
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Make websites accessible for AI agents
Making Docker and Kubernetes management easy.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Instant voice cloning by MIT and MyShell. Audio foundation model.
A modular graph-based Retrieval-Augmented Generation (RAG) system