Skip to content
View pauleonix's full-sized avatar
  • ZITI, Heidelberg U
  • Heidelberg, Germany

Block or report pauleonix

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

GPU

102 repositories

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 850 203 Updated Mar 5, 2025

RAPIDS Memory Manager

C++ 546 208 Updated Mar 5, 2025

GPUDirect Async implementation of HPGMG-FV CUDA

Cuda 10 Updated May 11, 2018

NumPy aware dynamic Python compiler using LLVM

Python 10,256 1,146 Updated Mar 5, 2025

CUDA integration for Python, plus shiny features

Python 1,907 291 Updated Feb 7, 2025

NumPy & SciPy for GPU

Python 9,960 888 Updated Mar 5, 2025

CUSP : A C++ Templated Sparse Matrix Library

C++ 411 132 Updated Nov 5, 2024

An Adaptive Pencil Decomposition Library for NVIDIA GPUs

C++ 60 12 Updated Mar 6, 2025

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Ene…

C++ 110 23 Updated Jan 11, 2025

MPI accelerator-integrated communication extensions

Cuda 32 6 Updated Apr 4, 2023

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 712 257 Updated Aug 19, 2024

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 351 50 Updated Mar 5, 2025

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 213 54 Updated Dec 3, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 6,974 1,137 Updated Feb 28, 2025

Kernel Tuner

Python 319 52 Updated Mar 5, 2025

Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

C++ 370 65 Updated Feb 11, 2016

AmgXWrapper: An interface between PETSc and the NVIDIA AmgX library

C++ 47 23 Updated May 30, 2022

A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory

C++ 296 76 Updated Nov 28, 2018

Subset of BLAS routines optimized for NVIDIA GPUs

Cuda 68 10 Updated Mar 27, 2023

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…

C++ 2,102 161 Updated Feb 18, 2025

Distributed multigrid linear solver library on GPU

Cuda 528 151 Updated Feb 7, 2025

ROCm Parallel Primitives

C++ 170 75 Updated Mar 5, 2025

ROCm Thrust - run Thrust dependent software on AMD GPUs

C++ 106 48 Updated Mar 4, 2025

CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups

Cuda 206 59 Updated Feb 27, 2025

A library to benchmark CUDA code, similar to google benchmark.

C++ 28 7 Updated Apr 18, 2021

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,729 450 Updated Oct 9, 2023

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 381 32 Updated Feb 7, 2025

CLTune: An automatic OpenCL & CUDA kernel tuner

C++ 174 36 Updated Dec 12, 2022

Launching collective tasks in bulk

C++ 37 5 Updated Oct 4, 2019