yester31

🚀

GOGOGO

derek park yester31

🚀

GOGOGO

I am an experienced AI programmer with exceptional skills in optimizing deep learning model inference using advanced techniques such as quantization and prunin

31 followers · 35 following

Seoul, Republic of Korea
in/yh-park

Achievements

Highlights

Developer Program Member

Starred repositories

DefTruth / CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,396 253 Updated Feb 19, 2025

a-hamdi / GPU

100 days of building Cuda kernels!

Cuda 223 19 Updated Feb 21, 2025

microsoft / vscode

Visual Studio Code

TypeScript 167,617 30,651 Updated Feb 21, 2025

onlybooks / llm

LLM을 활용한 실전 AI 애플리케이션 개발

Jupyter Notebook 121 88 Updated Aug 29, 2024

ita9naiwa / attention-impl

attention implemenation

Cuda 6 Updated Apr 10, 2024

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,199 364 Updated Feb 21, 2025

linjames0 / Transformer-CUDA

An implementation of the transformer architecture onto an Nvidia CUDA kernel

Cuda 169 11 Updated Sep 24, 2023

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 4,458 270 Updated Feb 21, 2025

QwenLM / Qwen2.5-Coder

Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.

Python 4,512 363 Updated Feb 14, 2025

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 324 48 Updated Jan 2, 2025

CoffeeBeforeArch / cuda_programming

Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch

Cuda 804 169 Updated Jul 19, 2023

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,779 365 Updated Feb 20, 2025

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 700 61 Updated Dec 30, 2024

leimao / CUTLASS-Examples

CUTLASS and CuTe Examples

Cuda 38 3 Updated Jan 4, 2025

wangzyon / NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Cuda 285 43 Updated Mar 30, 2022

cqylunlun / GLASS

[ECCV 2024] Official Implementation and Dataset Release for <A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization>

Python 229 34 Updated Feb 13, 2025

lix19937 / tensorrt-insight

Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda

C++ 15 Updated Feb 17, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 15,608 1,479 Updated Feb 19, 2025

ColfaxResearch / cutlass-kernels

Cuda 181 30 Updated Jul 11, 2024

MrNeRF / awesome-3D-gaussian-splatting

Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.

HTML 6,776 417 Updated Feb 21, 2025

KwaiVGI / LivePortrait

Bring portraits to life!

Python 14,104 1,516 Updated Feb 13, 2025

ptrvilya / blendify

Lightweight Python framework that provides a high-level API for creating and rendering scenes with Blender.

Python 803 21 Updated Dec 9, 2024

enazoe / yolo-tensorrt

TensorRT8.Support Yolov5n,s,m,l,x .darknet -> tensorrt. Yolov4 Yolov3 use raw darknet *.weights and *.cfg fils. If the wrapper is useful to you,please Star it.

C++ 1,191 315 Updated Mar 9, 2023