Lists (12)
Sort Name ascending (A-Z)
Starred repositories
An extremely fast Python package and project manager, written in Rust.
่งฃๅณCursorๅจๅ ่ดน่ฎข้ ๆ้ดๅบ็ฐไปฅไธๆ็คบ็้ฎ้ข: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please lโฆ
๐A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. ๐๐
ไธ็งไปปๅก็บงGPU็ฎๅๅๆถ่ฐๅบฆ็้ซๆง่ฝๆทฑๅบฆๅญฆไน ่ฎญ็ปๅนณๅฐ
The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".
SGLang is a fast serving framework for large language models and vision language models.
A throughput-oriented high-performance serving framework for LLMs
My learning notes/codes for ML SYS.
Integrate the DeepSeek API into popular softwares
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
๐ค ๐๐ฒ๐ฎ๐ฟ๐ป for ๐ณ๐ฟ๐ฒ๐ฒ how to ๐ฏ๐๐ถ๐น๐ฑ an end-to-end ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ป-๐ฟ๐ฒ๐ฎ๐ฑ๐ ๐๐๐ & ๐ฅ๐๐ ๐๐๐๐๐ฒ๐บ using ๐๐๐ ๐ข๐ฝ๐ best practices: ~ ๐ด๐ฐ๐ถ๐ณ๐ค๐ฆ ๐ค๐ฐ๐ฅ๐ฆ + 12 ๐ฉ๐ข๐ฏ๐ฅ๐ด-๐ฐ๐ฏ ๐ญ๐ฆ๐ด๐ด๐ฐ๐ฏ๐ด
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
ๅผบๅๅญฆไน ไธญๆๆ็จ๏ผ่่ไนฆ๐๏ผ๏ผๅจ็บฟ้ ่ฏปๅฐๅ๏ผhttps://datawhalechina.github.io/easy-rl/
verl: Volcano Engine Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
ๆฏไธชไบบ้ฝ่ฝ็ๆ็ๅคงๆจกๅ็ฅ่ฏๅไบซ๏ผLLMsๆฅ/็งๆๅคงๆจกๅ้ข่ฏๅๅฟ ็๏ผ่ฎฉไฝ ๅ้ข่ฏๅฎไพไพ่่ฐ
A game theoretic approach to explain the output of any machine learning model.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Fast and memory-efficient exact attention
Development repository for the Triton language and compiler
An annotated implementation of the Transformer paper.
Build Container Images In Kubernetes
A course on aligning smol models.