
Starred repositories
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
An Open-source RL System from ByteDance Seed and Tsinghua AIR
The official Python SDK for Model Context Protocol servers and clients
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
A C++ header-only HTTP/HTTPS server and client library
An extremely fast Python package and project manager, written in Rust.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
ZeroMQ core engine in C++, implements ZMTP/3.1
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
Official electron build of draw.io
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Mirror clone of https://gitee.com/gsls200808/chinese-opensource-mirror-site as the README.md on that repository has been filtered.
A highly optimized LLM inference acceleration engine for Llama and its variants.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
This repository contains demos I made with the Transformers library by HuggingFace.
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
A blazing fast inference solution for text embeddings models
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model