Skip to content
View Codedestructor56's full-sized avatar

Block or report Codedestructor56

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
10 stars written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 24,927 2,831 Updated Oct 2, 2024

📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 1,855 193 Updated Jan 3, 2025

how to optimize some algorithm in cuda.

Cuda 1,799 149 Updated Dec 28, 2024

Learn CUDA Programming, published by Packt

Cuda 1,058 245 Updated Dec 30, 2023

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 584 113 Updated Oct 30, 2024

A set of hands-on tutorials for CUDA programming

Cuda 201 33 Updated Apr 8, 2024

CUDA Matrix Multiplication Optimization

Cuda 148 14 Updated Jul 19, 2024

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 135 26 Updated Jan 3, 2025

Benchmark tests supporting the TiledCUDA library.

Cuda 13 2 Updated Nov 19, 2024