
-
Northwestern Polytechnical University
- Xi'an,Shan'xi,China
Starred repositories
RDMA CNI plugin for containerized workloads
Globally Addressable Memory management (efficient distributed memory management via RDMA and caching)
A Primer on Memory Consistency and Cache Coherence (Second Edition) 翻译计划
DeepSeek Coder: Let the Code Write Itself
A high-throughput and memory-efficient inference and serving engine for LLMs
Fast OS-level support for GPU checkpoint and restore
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
KV cache store for distributed LLM inference
Efficient and easy multi-instance LLM serving
NVIDIA Linux open GPU kernel module source
Justitia provides RDMA isolation between applications with diverse requirements.
High performance Transformer implementation in C++.
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
A curated list of resources on event-driven architecture.
A collection of awesome researchers and papers about disaggregated memory.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Heterogeneous AI Computing Virtualization Middleware
Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Artifact evaluation repo for EuroSys'24.
Sharing the codebase and steps for artifact evaluation for ISCA 2023 paper
A List of Recommender Systems and Resources