wangraying

Rui Wang wangraying

12 followers · 39 following

Microsoft

Achievements

Organizations

Stars

modelscope / dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

C 221 22 Updated Jan 24, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 3,576 362 Updated Jan 6, 2025

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 1,341 98 Updated Nov 18, 2024

pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention

Python 603 33 Updated Jan 26, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,850 154 Updated Jan 26, 2025

bayesian-optimization / BayesianOptimization

A Python implementation of global optimization with gaussian processes.

Python 8,046 1,555 Updated Jan 2, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,447 145 Updated Jan 24, 2025

Azure / AzurePublicDataset

Microsoft Azure Traces

Jupyter Notebook 872 149 Updated Dec 12, 2024

AmadeusChan / Awesome-LLM-System-Papers

529 23 Updated Sep 5, 2024

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 453 50 Updated Aug 19, 2024

microsoft / vidur

A large-scale simulation framework for LLM inference

Python 316 55 Updated Nov 19, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 301 38 Updated Sep 12, 2024

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,377 474 Updated Jan 27, 2025

mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training

Python 1,205 153 Updated Jan 27, 2025

guidance-ai / guidance

A guidance language for controlling large language models.

Jupyter Notebook 19,545 1,065 Updated Jan 29, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 8,213 797 Updated Jan 31, 2025

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,183 71 Updated Oct 14, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,270 1,086 Updated Jan 31, 2025