diliu0349

Di Liu diliu0349

1 follower · 1 following

Shanghai Jiao Tong University
Shanghai
https://diliu0349.github.io/

Achievements

Lists (3)

Sort

Stars

bytedance / FlexPrefill

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Python 54 2 Updated Mar 6, 2025

Mutinifni / splitwise-sim

LLM serving cluster simulator

Jupyter Notebook 93 8 Updated Apr 25, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 4,597 278 Updated Mar 8, 2025

eth-easl / deltazip

Compression for Foundation Models

Jupyter Notebook 27 3 Updated Feb 14, 2025

QLM-project / QLM

Python 7 Updated Jan 16, 2025

microsoft / AttentionEngine

C++ 46 2 Updated Feb 25, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,783 171 Updated Mar 7, 2025

HPMLL / BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 148 9 Updated Oct 15, 2024

PiotrNawrot / nano-sparse-attention

The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.

Jupyter Notebook 58 4 Updated Jan 25, 2025

UMass-LIDS / Proteus

Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling

Python 10 3 Updated Mar 7, 2024

AdaCache-DiT / AdaCache

Adaptive Caching for Faster Video Generation with Diffusion Transformers

Python 143 6 Updated Nov 5, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,143 424 Updated Feb 19, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,750 2,389 Updated Aug 12, 2024

KangarooGroup / Kangaroo

official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

Python 63 Updated Aug 30, 2024

Azure / AzurePublicDataset

Microsoft Azure Traces

Jupyter Notebook 895 152 Updated Feb 25, 2025

Gumpest / SparseVLMs

Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".

Python 78 6 Updated Mar 5, 2025

thunlp / Optima

Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"

Python 52 4 Updated Nov 14, 2024

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,660 338 Updated Feb 15, 2025

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 237 8 Updated Dec 26, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 7,019 1,149 Updated Mar 10, 2025

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 360 73 Updated Sep 8, 2024

PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 422 51 Updated Mar 6, 2025

SNU-ARC / any-precision-llm

[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Python 97 3 Updated Dec 23, 2024

snu-comparch / Tender

Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)

Python 13 1 Updated Jul 4, 2024

Tony-Tan / CUDA_Freshman

Cuda 2,349 458 Updated Jan 16, 2024

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 306 23 Updated Feb 20, 2025

DefTruth / Awesome-Diffusion-Inference

📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

197 12 Updated Jan 16, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 76,208 11,025 Updated Mar 10, 2025

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,308 45 Updated Mar 10, 2025

LoongServe / LoongServe

Jupyter Notebook 85 6 Updated Nov 11, 2024

Di Liu diliu0349

Lists (3)

Awesome X

CUDA

Serving

Stars