Skip to content
View countywest's full-sized avatar
🌿
🌿

Highlights

  • Pro

Block or report countywest

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

74 results for source starred repositories written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 25,239 2,897 Updated Oct 2, 2024

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 16,251 1,950 Updated Jan 27, 2025

A massively parallel, optimal functional runtime in Rust

Cuda 10,666 414 Updated Nov 21, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,933 196 Updated Feb 5, 2025

Sample codes for my CUDA programming book

Cuda 1,628 333 Updated Jul 27, 2023

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,255 146 Updated Nov 12, 2024

Learn CUDA Programming, published by Packt

Cuda 1,090 247 Updated Dec 30, 2023

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 924 55 Updated Jan 30, 2025

Fuse multiple depth frames into a TSDF voxel volume.

Cuda 746 135 Updated May 7, 2019
Cuda 713 55 Updated Oct 20, 2023

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 694 61 Updated Dec 30, 2024

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 682 250 Updated Aug 19, 2024

[ICCV 2023] Official code for NeuS2

Cuda 655 44 Updated Mar 22, 2024

Fast CUDA matrix multiplication from scratch

Cuda 614 79 Updated Dec 28, 2023

UNet diffusion model in pure CUDA

Cuda 597 27 Updated Jun 28, 2024

Fast k nearest neighbor search using GPU

Cuda 516 109 Updated Aug 6, 2018

Code for KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

Cuda 483 54 Updated Jun 16, 2021

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

Cuda 459 61 Updated Jun 3, 2024

NeRFshop: Interactive Editing of Neural Radiance Fields

Cuda 456 24 Updated Mar 27, 2023

CUDA Data Parallel Primitives Library

Cuda 426 96 Updated Nov 9, 2018

tutorial for writing custom pytorch cpp+cuda kernel, applied on volume rendering (NeRF)

Cuda 390 37 Updated Apr 17, 2023

Official implementation of "Decentralization and Acceleration Enables Large-Scale Bundle Adjustment"

Cuda 354 32 Updated Jul 28, 2023

pytorch knn [cuda version]

Cuda 298 38 Updated Dec 14, 2021

State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 264 12 Updated Dec 14, 2024

Code for "Representing Volumetric Videos as Dynamic MLP Maps" CVPR 2023

Cuda 238 10 Updated Dec 6, 2023
Next