Highlights
- Pro
Stars
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
A guidance language for controlling large language models.
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
An experimentation platform for LLM inference optimisation
16-fold memory access reduction with nearly no loss
This is the implementation repository of our OSDI'23 paper: SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory.
[Start here!] Flow-IPC - Modern C++ toolkit for high-speed inter-process communication (IPC)
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
SGLang is a fast serving framework for large language models and vision language models.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
A General-purpose Task-parallel Programming System using Modern C++
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
A large-scale simulation framework for LLM inference
Disaggregated serving system for Large Language Models (LLMs).
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, MLA, Parallelism, Prefix-Cache, Chunked-Prefill, etc. 🎉🎉
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.