Skip to content
View Codedestructor56's full-sized avatar

Block or report Codedestructor56

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
10 stars written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 25,086 2,865 Updated Oct 2, 2024

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 1,997 207 Updated Jan 20, 2025

how to optimize some algorithm in cuda.

Cuda 1,834 151 Updated Jan 19, 2025

Learn CUDA Programming, published by Packt

Cuda 1,075 247 Updated Dec 30, 2023

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 589 113 Updated Oct 30, 2024

A set of hands-on tutorials for CUDA programming

Cuda 206 33 Updated Apr 8, 2024

CUDA Matrix Multiplication Optimization

Cuda 153 14 Updated Jul 19, 2024

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 136 27 Updated Jan 6, 2025

Benchmark tests supporting the TiledCUDA library.

Cuda 12 2 Updated Nov 19, 2024