CuPBoP-AMD is a CUDA translator that translates CUDA programs at NVVM IR level to HIP-compatible IR that can run on AMD GPUs. Currently, CuPBoP-AMD translates a broader range of applications in the…

LLVM 3 Updated Nov 10, 2023

ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability

C++ 3,845 546 Updated Jan 24, 2025

paramhanji / CUDA-CNN

Implementation of a simple CNN using CUDA

Cuda 66 20 Updated May 2, 2017

gthparch / CuPBoP-AMD

CuPBoP-AMD is a CUDA translator that translates CUDA programs at NVVM IR level to HIP-compatible IR that can run on AMD GPUs.

LLVM 36 4 Updated Nov 19, 2023

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,108 352 Updated Jan 24, 2025

onnx / models

A collection of pre-trained, state-of-the-art models in the ONNX format

Jupyter Notebook 8,178 1,421 Updated Apr 30, 2024

OAID / AutoKernel

AutoKernel 是一个简单易用，低门槛的自动算子优化工具，提高深度学习算法部署效率。

C++ 736 94 Updated Sep 23, 2022

riscv-non-isa / rvv-intrinsic-doc

C 305 90 Updated Nov 19, 2024

BBuf / model_quantization

Python 11 2 Updated Dec 31, 2019

PX4 / eigen

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

C++ 631 133 Updated Oct 18, 2023

protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format

C++ 66,395 15,593 Updated Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lanshanikilven

Block or report lanshanikilven

Stars

ROCm / MIOpen

gthparch / NVPTX-SPIRV-Translator

ColfaxResearch / cfx-article-src

Dao-AILab / flash-attention

Bruce-Lee-LY / flash_attention_inference

ROCm / hipBLAS

BBuf / how-to-optim-algorithm-in-cuda

66RING / tiny-flash-attention

zhumakhan / flash-attention-wmma

Felix-Zhenghao / flash-attention-v2-minimal

Repeerc / sd-webui-flash-attention2-rdna3-rocm

weishengying / tiny-flash-attention

decodecudabinary / Decoding-CUDA-Binary

kilianhae / FlashAttention.C

cupbop / CuPBoP

interestingLSY / CUDA-From-Correctness-To-Performance-Code

CRobeck / instrument-amdgpu-kernels

vosen / ZLUDA

ROCm / ROCm

Remind8 / CuPBoP-AMD