-
Northwestern Polytechnical University
- Xi'an,Shan'xi,China
Starred repositories
A high-throughput and memory-efficient inference and serving engine for LLMs
Fast OS-level support for GPU checkpoint and restore
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
A distributed KV store for disaggregated LLM inference
Efficient and easy multi-instance LLM serving
NVIDIA Linux open GPU kernel module source
Justitia provides RDMA isolation between applications with diverse requirements.
High performance Transformer implementation in C++.
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
A curated list of resources on event-driven architecture.
A collection of awesome researchers and papers about disaggregated memory.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Heterogeneous AI Computing Virtualization Middleware
Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Artifact evaluation repo for EuroSys'24.
Sharing the codebase and steps for artifact evaluation for ISCA 2023 paper
A List of Recommender Systems and Resources
User documentation for Knative components.
Underlay and RDMA network solution of the Kubernetes, for bare metal, VM and any public cloud
rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.