- Seoul, Republic of Korea
- in/yh-park
Highlights
Starred repositories
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
An implementation of the transformer architecture onto an Nvidia CUDA kernel
Efficient Triton Kernels for LLM Training
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch
Flash Attention in ~100 lines of CUDA (forward pass only)
Step-by-step optimization of CUDA SGEMM
[ECCV 2024] Official Implementation and Dataset Release for <A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization>
Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda
Fast and memory-efficient exact attention
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
Lightweight Python framework that provides a high-level API for creating and rendering scenes with Blender.
TensorRT8.Support Yolov5n,s,m,l,x .darknet -> tensorrt. Yolov4 Yolov3 use raw darknet *.weights and *.cfg fils. If the wrapper is useful to you,please Star it.
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
FlashInfer: Kernel Library for LLM Serving
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
Notes on quantization in neural networks