yjw258

Follow

Jiawei Yin yjw258

Follow

10 followers · 27 following

University of Science and Technology of China
Hefei, Anhui

Highlights

Pro

Starred repositories

smart-lty / ParallelSpeculativeDecoding

[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length

Python 83 5 Updated Apr 14, 2025

KaiLv69 / DuoDecoding

DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

Python 14 2 Updated Mar 4, 2025

dvlab-research / VisionZip

Official repository for VisionZip (CVPR 2025)

Python 284 12 Updated Feb 27, 2025

tianyi-lab / MiP-Overthinking

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Python 27 1 Updated Apr 10, 2025

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,257 145 Updated May 18, 2025

allegro / allRank

allRank is a framework for training learning-to-rank neural models based on PyTorch.

Python 933 125 Updated Aug 6, 2024

hao-ai-lab / vllm-ltr

[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank

Python 48 12 Updated Nov 4, 2024

James-QiuHaoran / LLM-serving-with-proxy-models

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)

Jupyter Notebook 35 6 Updated Jun 1, 2024

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,244 139 Updated May 22, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,427 466 Updated May 17, 2025

vllm-project / vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Python 668 158 Updated May 23, 2025

PKUFlyingPig / MIT6.5940_TinyML

Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing

Jupyter Notebook 46 3 Updated Jan 8, 2025

minitorch / minitorch

The full minitorch student suite.

Python 2,074 458 Updated Aug 17, 2024

zeux / calm

CUDA/Metal accelerated language model inference

C 578 26 Updated Apr 10, 2025

andrewkchan / yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 358 33 Updated Jan 15, 2025

Hannibal046 / Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

23,453 1,962 Updated May 9, 2025

Infini-AI-Lab / TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 250 17 Updated Aug 31, 2024

ByteDance-Seed / ShadowKV

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 183 12 Updated May 1, 2025

MegEngine / InferLLM

a lightweight LLM model inference framework

C++ 728 93 Updated Apr 7, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,537 1,444 Updated May 23, 2025

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

1,234 69 Updated May 17, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 812 37 Updated May 10, 2025

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

740 44 Updated May 22, 2025

meta-llama / llama

Inference code for Llama models

Python 58,260 9,771 Updated Jan 26, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 14,572 1,820 Updated May 23, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,123 997 Updated May 23, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,312 259 Updated May 22, 2025

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,528 175 Updated Jun 25, 2024

ggml-org / llama.cpp

LLM inference in C/C++

C++ 80,718 11,873 Updated May 23, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 47,884 7,558 Updated May 23, 2025

Starred topics

Ubuntu

Server

Shell

Rust

Python

MySQL

Machine learning

Linux

$latex logo$

LaTeX

Kubernetes

See all starred topics