pauleonix

pauleonix

Ph.D. student in Computational Science and Engineering researching GPU-accelerated preconditioners and solvers for sparse linear problems; M.Sc. in physics.

10 followers · 182 following

ZITI, Heidelberg U
Heidelberg, Germany

Achievements

Stars

GPU

102 repositories

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 850 203 Updated Mar 5, 2025

rapidsai / rmm

RAPIDS Memory Manager

C++ 546 208 Updated Mar 5, 2025

e-ago / hpgmg-cuda-async

GPUDirect Async implementation of HPGMG-FV CUDA

Cuda 10 Updated May 11, 2018

numba / numba

NumPy aware dynamic Python compiler using LLVM

Python 10,256 1,146 Updated Mar 5, 2025

inducer / pycuda

CUDA integration for Python, plus shiny features

Python 1,907 291 Updated Feb 7, 2025

cupy / cupy

NumPy & SciPy for GPU

Python 9,960 888 Updated Mar 5, 2025

cusplibrary / cusplibrary

CUSP : A C++ Templated Sparse Matrix Library

C++ 411 132 Updated Nov 5, 2024

NVIDIA / cuDecomp

An Adaptive Pencil Decomposition Library for NVIDIA GPUs

C++ 60 12 Updated Mar 6, 2025

icl-utk-edu / slate

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Ene…

C++ 110 23 Updated Jan 11, 2025

NVIDIA / mpi-acx

MPI accelerator-integrated communication extensions

Cuda 32 6 Updated Apr 4, 2023

olcf / cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 712 257 Updated Aug 19, 2024

brycelelbach / thrust_wiki

7 2 Updated Sep 28, 2021

NVIDIA / NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 351 50 Updated Mar 5, 2025

FZJ-JSC / tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 213 54 Updated Dec 3, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 6,974 1,137 Updated Feb 28, 2025

KernelTuner / kernel_tuner

Kernel Tuner

Python 319 52 Updated Mar 5, 2025

HSA-Libraries / Bolt

Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

C++ 370 65 Updated Feb 11, 2016

barbagroup / AmgXWrapper

AmgXWrapper: An interface between PETSc and the NVIDIA AmgX library

C++ 47 23 Updated May 30, 2022

NVIDIA / cnmem

A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory

C++ 296 76 Updated Nov 28, 2018

ecrc / kblas-gpu

Subset of BLAS routines optimized for NVIDIA GPUs

Cuda 68 10 Updated Mar 27, 2023

KomputeProject / kompute

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…

C++ 2,102 161 Updated Feb 18, 2025

NVIDIA / AMGX

Distributed multigrid linear solver library on GPU

Cuda 528 151 Updated Feb 7, 2025

ROCm / rocPRIM

ROCm Parallel Primitives

C++ 170 75 Updated Mar 5, 2025

ROCm / rocThrust

ROCm Thrust - run Thrust dependent software on AMD GPUs

C++ 106 48 Updated Mar 4, 2025

NVlabs / CGBN

CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups

Cuda 206 59 Updated Feb 27, 2025

gevtushenko / cuda_benchmark

A library to benchmark CUDA code, similar to google benchmark.

C++ 28 7 Updated Apr 18, 2021

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,729 450 Updated Oct 9, 2023

NVIDIA / nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 381 32 Updated Feb 7, 2025

CNugteren / CLTune

CLTune: An automatic OpenCL & CUDA kernel tuner

C++ 174 36 Updated Dec 12, 2022

jaredhoberock / bulk

Launching collective tasks in bulk

C++ 37 5 Updated Oct 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pauleonix

Achievements

Achievements

Block or report pauleonix

GPU

rapidsai / raft

rapidsai / rmm

e-ago / hpgmg-cuda-async

numba / numba

inducer / pycuda

cupy / cupy

cusplibrary / cusplibrary

NVIDIA / cuDecomp

icl-utk-edu / slate

NVIDIA / mpi-acx

olcf / cuda-training-series

brycelelbach / thrust_wiki

NVIDIA / NVTX

FZJ-JSC / tutorial-multi-gpu

NVIDIA / cutlass

KernelTuner / kernel_tuner

HSA-Libraries / Bolt

barbagroup / AmgXWrapper

NVIDIA / cnmem

ecrc / kblas-gpu

KomputeProject / kompute

NVIDIA / AMGX

ROCm / rocPRIM

ROCm / rocThrust

NVlabs / CGBN

gevtushenko / cuda_benchmark

NVIDIA / cub

NVIDIA / nvbandwidth

CNugteren / CLTune

jaredhoberock / bulk