hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

Assembly 77 103 Updated Feb 11, 2025

Coloquinte / sleekit

Bag of Tricks for NN Quantization

Python 3 Updated Dec 9, 2024

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 624 81 Updated Dec 28, 2023

ROCm / amd_matrix_instruction_calculator

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Python 73 8 Updated Jan 2, 2024

ROCm / rocMLIR

MLIR 137 40 Updated Feb 11, 2025

NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…

Python 701 52 Updated Feb 11, 2025

NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++ 299 55 Updated Feb 11, 2025

ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.

Python 231 154 Updated Feb 7, 2025

ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 340 146 Updated Feb 11, 2025

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 622 55 Updated Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fxmarty-amd

Block or report fxmarty-amd

Stars

sgl-project / sglang

ROCm / amdsmi

flashinfer-ai / flashinfer

lm-sys / FastChat

MDK8888 / vllmini

tile-ai / tilelang

EleutherAI / lm-evaluation-harness

lyy1994 / awesome-data-contamination

vllm-project / vllm

llvm / llvm-project

ray-project / ray

bytedance / flux

ROCm / rocmProfileData

vosen / ZLUDA

google / clspv

CHIP-SPV / chipStar

lights0123 / hipscript

ROCm / clr

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

ROCm / hipBLAS

ROCm / hipBLASLt