-
ZITI, Heidelberg U
- Heidelberg, Germany
GPU
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
GPUDirect Async implementation of HPGMG-FV CUDA
NumPy aware dynamic Python compiler using LLVM
CUDA integration for Python, plus shiny features
CUSP : A C++ Templated Sparse Matrix Library
An Adaptive Pencil Decomposition Library for NVIDIA GPUs
SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Ene…
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.
AmgXWrapper: An interface between PETSc and the NVIDIA AmgX library
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…
ROCm Thrust - run Thrust dependent software on AMD GPUs
CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups
A library to benchmark CUDA code, similar to google benchmark.
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
A tool for bandwidth measurements on NVIDIA GPUs.