Stars
A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs
A simple high performance CUDA GEMM implementation.
This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.
A library of GPU kernels for sparse matrix operations.
PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity
Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding
Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, an…
Code of implementation of optimisation of kernel function SpGEMM on DCU.
The source code of the paper "Accelerating CPU-based Sparse General Matrix Multiplication with Binary Row Merging"
CSR-based SpGEMM on nVidia and AMD GPUs
This repository is obtained from https://bitbucket.org/azadcse/hipmcl/src
SuiteSparse:GraphBLAS: graph algorithms in the language of linear algebra. For production: (default) STABLE branch. Code development: ask me for the right branch before submitting a PR. video intro…
Source code for VLDB 2015 paper "The More the Merrier: Efficient Multi-Source Graph Traversal"
Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
Implementation of 3d non-separable convolution using CUDA & FFT Convolution
Implementation of the paper - Fast Training of Convolutional Networks through FFTs (CUDA for parallelization)
Winograd minimal convolution algorithm generator for convolutional neural networks.
QUDA is a library for performing calculations in lattice QCD on GPUs.