Skip to content
View QAQhh's full-sized avatar

Block or report QAQhh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Assembler for NVIDIA Maxwell architecture

Sass 982 165 Updated Jan 3, 2023

Fast CUDA matrix multiplication from scratch

Cuda 663 93 Updated Dec 28, 2023

A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs

C++ 18 1 Updated Nov 29, 2023

A simple high performance CUDA GEMM implementation.

Cuda 353 40 Updated Jan 4, 2024

This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.

C++ 26 2 Updated Dec 27, 2024
Cuda 2 Updated May 9, 2023

GPU implementation of Winograd convolution

Cuda 10 4 Updated Oct 23, 2017

A Winograd Minimal Filter Implementation in CUDA

Cuda 24 2 Updated Aug 25, 2021

A library of GPU kernels for sparse matrix operations.

C++ 260 52 Updated Nov 24, 2020

PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity

Cuda 107 27 Updated Mar 17, 2025
Cuda 107 11 Updated Jul 3, 2021

Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding

C++ 14 3 Updated Oct 20, 2021

Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, an…

C 39 8 Updated May 22, 2024

Code of implementation of optimisation of kernel function SpGEMM on DCU.

C++ 1 Updated Mar 5, 2023

The source code of the paper "Accelerating CPU-based Sparse General Matrix Multiplication with Binary Row Merging"

C++ 4 Updated Aug 23, 2022

Efficient SpGEMM on GPU using CUDA and CSR

Cuda 52 15 Updated Jul 18, 2023

CSR-based SpGEMM on nVidia and AMD GPUs

C++ 45 8 Updated Apr 9, 2016

This repository is obtained from https://bitbucket.org/azadcse/hipmcl/src

C++ 1 Updated Sep 2, 2023

SuiteSparse:GraphBLAS: graph algorithms in the language of linear algebra. For production: (default) STABLE branch. Code development: ask me for the right branch before submitting a PR. video intro…

C 372 68 Updated Mar 18, 2025

Source code for VLDB 2015 paper "The More the Merrier: Efficient Multi-Source Graph Traversal"

C++ 24 11 Updated Sep 22, 2015

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library

C++ 1,604 101 Updated Mar 17, 2025

Fast CUDA Kernels for ResNet Inference.

Cuda 173 47 Updated May 26, 2019

Implementation of 3d non-separable convolution using CUDA & FFT Convolution

C++ 20 13 Updated Jan 15, 2019

Implementation of the paper - Fast Training of Convolutional Networks through FFTs (CUDA for parallelization)

Jupyter Notebook 10 1 Updated May 8, 2020

Winograd minimal convolution algorithm generator for convolutional neural networks.

Python 613 145 Updated Oct 17, 2020

AMD's Machine Intelligence Library

Assembly 1,130 243 Updated Mar 20, 2025

QUDA is a library for performing calculations in lattice QCD on GPUs.

C++ 307 106 Updated Mar 20, 2025

CUDA project for uni subject

Jupyter Notebook 23 2 Updated Oct 26, 2020

The SHOC Benchmark Suite

Makefile 250 103 Updated Jan 29, 2022
Next