-
University of Utah
- Salt Lake City
-
01:36
(UTC -06:00) - https://www.lichendi.top
- in/chendi-li
Highlights
- Pro
Stars
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs
Allo: A Programming Model for Composable Accelerator Design
A Easy-to-understand TensorOp Matmul Tutorial
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing. By providing a higher-level interface, algorithm developers can de…
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Development repository for the Triton language and compiler
Python tool for converting files and office documents to Markdown.
ML models + benchmark for tabular data classification and regression
Exocompilation for productive programming of hardware accelerators
Simplify Caddy configs with SSL, proxies, file servers, security headers, compression & more.
Shared Middle-Layer for Triton Compilation
EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks, enabling compute efficient training and inference.
Ongoing research training transformer models at scale