Skip to content
View lingjiekong's full-sized avatar

Highlights

  • Pro

Organizations

@CambioML

Block or report lingjiekong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

5 stars written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 25,492 2,929 Updated Oct 2, 2024

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,282 244 Updated Feb 7, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 1,986 202 Updated Feb 12, 2025

CUDA Library Samples

Cuda 1,768 364 Updated Jan 28, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 943 58 Updated Jan 30, 2025