Skip to content
View luchangli03's full-sized avatar

Block or report luchangli03

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,368 598 Updated May 20, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,678 772 Updated May 23, 2025

Differentiable fast wavelet transforms in PyTorch with GPU support.

Python 356 38 Updated May 16, 2025

Numerical integration in arbitrary dimensions on the GPU using PyTorch / TF / JAX

Python 202 41 Updated Nov 25, 2024

Minimal reproduction of DeepSeek R1-Zero

Python 11,792 1,487 Updated Apr 24, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,019 310 Updated May 22, 2025

Fully open reproduction of DeepSeek-R1

Python 24,521 2,257 Updated May 22, 2025

My learning notes/codes for ML SYS.

Python 2,244 139 Updated May 22, 2025

Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"

Python 130 11 Updated May 21, 2025

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.

Python 159 10 Updated Oct 3, 2024

Acode - powerful text/code editor for android

JavaScript 3,413 495 Updated May 22, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

Python 4,027 277 Updated May 18, 2025

Materials for learning SGLang

418 32 Updated May 16, 2025

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

C 251 27 Updated May 19, 2025
Cuda 134 17 Updated Mar 18, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 7,595 839 Updated Apr 30, 2025

LeetCode 101:力扣刷题指南

9,418 1,231 Updated Dec 8, 2024

Tile primitives for speedy kernels

Cuda 2,364 142 Updated May 23, 2025

Fastest kernels written from scratch

Cuda 264 37 Updated Apr 3, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 14,572 1,820 Updated May 23, 2025

This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.

C++ 30 3 Updated Dec 27, 2024

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 414 79 Updated Sep 8, 2024
Python 210 15 Updated Jan 23, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,500 671 Updated May 22, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 356 46 Updated Sep 21, 2024

Yinghan's Code Sample

Cuda 329 58 Updated Jul 25, 2022

Fast CUDA matrix multiplication from scratch

Cuda 723 109 Updated Dec 28, 2023

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,312 259 Updated May 22, 2025
Next