DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

C 251 27 Updated May 19, 2025

AyakaGEMM / Hands-on-GEMM

Cuda 134 17 Updated Mar 18, 2024

wdndev / llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

HTML 7,595 839 Updated Apr 30, 2025

changgyhub / leetcode_101

LeetCode 101：力扣刷题指南

9,418 1,231 Updated Dec 8, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,364 142 Updated May 23, 2025

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 264 37 Updated Apr 3, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 14,572 1,820 Updated May 23, 2025

Qwesh157 / conv_op_optimization

This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.

C++ 30 3 Updated Dec 27, 2024

nicolaswilde / cuda-tensorcore-hgemm

Cuda 141 25 Updated Dec 26, 2024

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 414 79 Updated Sep 8, 2024

apple / ml-recurrent-drafter

Python 210 15 Updated Jan 23, 2025

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,500 671 Updated May 22, 2025

KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

C++ 356 46 Updated Sep 21, 2024

Yinghan-Li / YHs_Sample

Yinghan's Code Sample

Cuda 329 58 Updated Jul 25, 2022

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 723 109 Updated Dec 28, 2023

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,312 259 Updated May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Luchang Li luchangli03

Achievements

Achievements

Block or report luchangli03

Stars

deepseek-ai / DeepGEMM

deepseek-ai / DeepEP

v0lta / PyTorch-Wavelet-Toolbox

stepfun-ai / Step-Video-T2V

esa / torchquad

Jiayi-Pan / TinyZero

flashinfer-ai / flashinfer

huggingface / open-r1

zhaochenyang20 / Awesome-ML-SYS-Tutorial

ruikangliu / FlatQuant

Hsu1023 / DuQuant

Acode-Foundation / Acode

xlite-dev / Awesome-LLM-Inference

sgl-project / sgl-learning-materials

modelscope / dash-infer