Ma-Dan

Ma-Dan

132 followers · 73 following

Achievements

x3 x2

Achievements

x3 x2

Stars

19 stars written in Cuda

Clear filter

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 26,651 3,063 Updated May 10, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,428 467 Updated May 17, 2025

baidu-research / warp-ctc

Fast parallel CTC.

Cuda 4,078 1,037 Updated Mar 4, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,019 310 Updated May 22, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,200 192 Updated May 23, 2025

k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,198 224 Updated May 22, 2025

antonmks / Alenka

GPU database engine

Cuda 1,172 120 Updated Jan 30, 2017

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,049 155 Updated Jul 29, 2023

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 414 79 Updated Sep 8, 2024

Cjkkkk / CUDA_gemm

A simple high performance CUDA GEMM implementation.

Cuda 370 41 Updated Jan 4, 2024

facebookresearch / FBTT-Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed th…

Cuda 194 27 Updated Jul 20, 2022