Skip to content
View Lurkrazy's full-sized avatar
😹
Enjoying everything
😹
Enjoying everything

Highlights

  • Pro

Organizations

@HPCRL

Block or report Lurkrazy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Sampling profiler for Python programs

Rust 13,416 450 Updated Feb 6, 2025

Cloudflare Turnsile Bypass Lib

Python 64 4 Updated Mar 11, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 586 29 Updated Mar 19, 2025

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 771 49 Updated Mar 24, 2025
Python 12 3 Updated May 18, 2024

[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs

Cuda 74 3 Updated Jun 7, 2024

Allo: A Programming Model for Composable Accelerator Design

Python 216 35 Updated Mar 24, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 331 38 Updated Sep 21, 2024

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing. By providing a higher-level interface, algorithm developers can de…

Cuda 69 5 Updated Mar 23, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Python 237 21 Updated Mar 18, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 104 12 Updated Mar 21, 2025
Python 4 2 Updated Jan 20, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 774 59 Updated Mar 24, 2025

ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch

Python 32 Updated Aug 8, 2024

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 852 164 Updated Dec 30, 2024

Development repository for the Triton language and compiler

MLIR 14,958 1,881 Updated Mar 23, 2025

Python tool for converting files and office documents to Markdown.

Python 41,272 1,952 Updated Mar 22, 2025

ML models + benchmark for tabular data classification and regression

Python 111 7 Updated Mar 10, 2025

Ahead of Time (AOT) Triton Math Library

Python 55 19 Updated Mar 21, 2025

Cataloging released Triton kernels.

208 9 Updated Jan 10, 2025
C++ 4 Updated Aug 20, 2024

Exocompilation for productive programming of hardware accelerators

Python 579 38 Updated Mar 21, 2025

Simplify Caddy configs with SSL, proxies, file servers, security headers, compression & more.

Vue 357 18 Updated Jan 12, 2025

Shared Middle-Layer for Triton Compilation

MLIR 233 56 Updated Mar 11, 2025

triton ops

Python 4 1 Updated Mar 24, 2025

EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks, enabling compute efficient training and inference.

Python 59 5 Updated Mar 10, 2025
Python 73 19 Updated Nov 7, 2024

Puzzles for learning Triton

Jupyter Notebook 1,530 119 Updated Nov 18, 2024

Fast low-bit matmul kernels in Triton

Python 270 21 Updated Mar 21, 2025

Ongoing research training transformer models at scale

Python 11,860 2,663 Updated Mar 24, 2025
Next