Skip to content
View yester31's full-sized avatar
🚀
GOGOGO
🚀
GOGOGO

Block or report yester31

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,396 253 Updated Feb 19, 2025

100 days of building Cuda kernels!

Cuda 223 19 Updated Feb 21, 2025

Visual Studio Code

TypeScript 167,617 30,651 Updated Feb 21, 2025

LLM을 활용한 실전 AI 애플리케이션 개발

Jupyter Notebook 121 88 Updated Aug 29, 2024

attention implemenation

Cuda 6 Updated Apr 10, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,199 364 Updated Feb 21, 2025

An implementation of the transformer architecture onto an Nvidia CUDA kernel

Cuda 169 11 Updated Sep 24, 2023

Efficient Triton Kernels for LLM Training

Python 4,458 270 Updated Feb 21, 2025

Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.

Python 4,512 363 Updated Feb 14, 2025

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 324 48 Updated Jan 2, 2025

Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch

Cuda 804 169 Updated Jul 19, 2023

CUDA Library Samples

Cuda 1,779 365 Updated Feb 20, 2025

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 700 61 Updated Dec 30, 2024

CUTLASS and CuTe Examples

Cuda 38 3 Updated Jan 4, 2025

Step-by-step optimization of CUDA SGEMM

Cuda 285 43 Updated Mar 30, 2022

[ECCV 2024] Official Implementation and Dataset Release for <A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization>

Python 229 34 Updated Feb 13, 2025

Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda

C++ 15 Updated Feb 17, 2025

Fast and memory-efficient exact attention

Python 15,608 1,479 Updated Feb 19, 2025

Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.

HTML 6,776 417 Updated Feb 21, 2025

Bring portraits to life!

Python 14,104 1,516 Updated Feb 13, 2025

Lightweight Python framework that provides a high-level API for creating and rendering scenes with Blender.

Python 803 21 Updated Dec 9, 2024

TensorRT8.Support Yolov5n,s,m,l,x .darknet -> tensorrt. Yolov4 Yolov3 use raw darknet *.weights and *.cfg fils. If the wrapper is useful to you,please Star it.

C++ 1,191 315 Updated Mar 9, 2023

Official inference framework for 1-bit LLMs

C++ 12,746 894 Updated Feb 18, 2025

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Python 17,833 2,477 Updated Feb 20, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,117 219 Updated Feb 20, 2025

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 418 20 Updated Oct 16, 2024

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 2,786 273 Updated Dec 21, 2024

Notes on quantization in neural networks

Jupyter Notebook 70 15 Updated Dec 14, 2023

CUDA Templates for Linear Algebra Subroutines

C++ 6,354 1,083 Updated Feb 21, 2025
Next