Skip to content
View yjw258's full-sized avatar
  • University of Science and Technology of China
  • Hefei, Anhui

Highlights

  • Pro

Block or report yjw258

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length

Python 83 5 Updated Apr 14, 2025

DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

Python 14 2 Updated Mar 4, 2025

Official repository for VisionZip (CVPR 2025)

Python 284 12 Updated Feb 27, 2025

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Python 27 1 Updated Apr 10, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,257 145 Updated May 18, 2025

allRank is a framework for training learning-to-rank neural models based on PyTorch.

Python 933 125 Updated Aug 6, 2024

[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank

Python 48 12 Updated Nov 4, 2024

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)

Jupyter Notebook 35 6 Updated Jun 1, 2024

My learning notes/codes for ML SYS.

Python 2,244 139 Updated May 22, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,427 466 Updated May 17, 2025

Community maintained hardware plugin for vLLM on Ascend

Python 668 158 Updated May 23, 2025

Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing

Jupyter Notebook 46 3 Updated Jan 8, 2025

The full minitorch student suite.

Python 2,074 458 Updated Aug 17, 2024

CUDA/Metal accelerated language model inference

C 578 26 Updated Apr 10, 2025

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 358 33 Updated Jan 15, 2025

Awesome-LLM: a curated list of Large Language Model

23,453 1,962 Updated May 9, 2025

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 250 17 Updated Aug 31, 2024

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 183 12 Updated May 1, 2025

a lightweight LLM model inference framework

C++ 728 93 Updated Apr 7, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,537 1,444 Updated May 23, 2025

Large Language Model (LLM) Systems Paper List

1,234 69 Updated May 17, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 812 37 Updated May 10, 2025

📰 Must-read papers and blogs on Speculative Decoding ⚡️

740 44 Updated May 22, 2025

Inference code for Llama models

Python 58,260 9,771 Updated Jan 26, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 14,572 1,820 Updated May 23, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,123 997 Updated May 23, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,312 259 Updated May 22, 2025

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,528 175 Updated Jun 25, 2024

LLM inference in C/C++

C++ 80,718 11,873 Updated May 23, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 47,884 7,558 Updated May 23, 2025
Next