- Santa Clara
- https://kaixih.github.io/
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Book_2_《可视之美》 | 鸢尾花书:从加减乘除到机器学习,欢迎批评指正
Book_3_《数学要素》 | 鸢尾花书:从加减乘除到机器学习;上架;欢迎继续纠错,纠错多的同学还会有赠书!
Book_4_《矩阵力量》 | 鸢尾花书:从加减乘除到机器学习;上架!
Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!
Book_7_《机器学习》 | 鸢尾花书:从加减乘除到机器学习;欢迎批评指正
Book_1_《编程不难》 | 鸢尾花书:从加减乘除到机器学习;请多多批评指正!
Book_5_《统计至简》 | 鸢尾花书:从加减乘除到机器学习;上架!
Experiments and prototypes associated with IREE or MLIR
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
A machine learning compiler for GPUs, CPUs, and ML accelerators
Chinese translation of Bjarne Stroustrup's HOPL4 paper
A feature-rich command-line audio/video downloader
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
Bash Line Editor―a line editor written in pure Bash with syntax highlighting, auto suggestions, vim modes, etc. for Bash interactive sessions.
A delightful community-driven framework for managing your bash configuration, and an auto-update tool so that makes it easy to keep up with the latest updates from the community.
A TensorFlow Extension: GPU performance tools for TensorFlow.
A visualization tool to display TF-Grappler optimized op graph
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
🚀An automatic configuration program for vim