Skip to content
View CheYulin's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.
  • Huawei
  • Shen Zhen

Highlights

  • Pro

Organizations

@RapidsBlink @RapidsAtHKUST

Block or report CheYulin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,234 783 Updated Mar 20, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,913 220 Updated Mar 4, 2025

Obsidian 优秀中文插件、主题与资源

350 14 Updated Feb 8, 2025

The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Python 67 Updated Jan 23, 2025

A guidance language for controlling large language models.

Jupyter Notebook 19,924 1,094 Updated Mar 19, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,351 46 Updated Mar 19, 2025

Redis for LLMs

Python 627 69 Updated Mar 21, 2025
Python 8 4 Updated Jan 30, 2025
Python 82 7 Updated Dec 31, 2024
Python 73 7 Updated Nov 25, 2024

An experimentation platform for LLM inference optimisation

Jupyter Notebook 29 3 Updated Sep 19, 2024

16-fold memory access reduction with nearly no loss

Python 81 5 Updated Mar 19, 2025

This is the implementation repository of our OSDI'23 paper: SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory.

C++ 59 15 Updated Oct 28, 2024

A radix tree implementation in ANSI C

C 1,140 168 Updated Nov 26, 2023

[Start here!] Flow-IPC - Modern C++ toolkit for high-speed inter-process communication (IPC)

C++ 378 14 Updated Mar 8, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 13,048 883 Updated Mar 22, 2025
Jupyter Notebook 53 4 Updated Jun 13, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,049 242 Updated Mar 23, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 12,289 1,323 Updated Mar 22, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 15,557 1,806 Updated Mar 2, 2025

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 6,335 724 Updated Oct 22, 2024

A General-purpose Task-parallel Programming System using Modern C++

C++ 10,702 1,258 Updated Mar 22, 2025

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Python 116 22 Updated Jul 10, 2024

A large-scale simulation framework for LLM inference

Python 350 61 Updated Nov 19, 2024

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 507 53 Updated Aug 19, 2024

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

235 10 Updated Mar 6, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, MLA, Parallelism, Prefix-Cache, Chunked-Prefill, etc. 🎉🎉

3,692 260 Updated Mar 4, 2025

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行

C++ 3,443 351 Updated Mar 19, 2025

LLM inference in C/C++

C++ 77,025 11,166 Updated Mar 22, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,893 185 Updated Mar 20, 2025
Next