Skip to content
View Codedestructor56's full-sized avatar

Block or report Codedestructor56

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
10 stars written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 25,936 2,970 Updated Oct 2, 2024

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,731 283 Updated Mar 4, 2025

how to optimize some algorithm in cuda.

Cuda 1,951 173 Updated Mar 5, 2025

Learn CUDA Programming, published by Packt

Cuda 1,115 249 Updated Dec 30, 2023

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 631 119 Updated Feb 21, 2025

A set of hands-on tutorials for CUDA programming

Cuda 213 33 Updated Apr 8, 2024

CUDA Matrix Multiplication Optimization

Cuda 168 16 Updated Jul 19, 2024

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 140 27 Updated Mar 2, 2025

Benchmark tests supporting the TiledCUDA library.

Cuda 15 2 Updated Nov 19, 2024