-
NAVER LABS
- Seongnam, South Korea
- in/hyeontae-son-933691152
- https://scholar.google.com/citations?user=krYfI4AAAAAJ
Highlights
- Pro
Starred repositories
Instant neural graphics primitives: lightning fast NeRF and more
A massively parallel, optimal functional runtime in Rust
FlashInfer: Kernel Library for LLM Serving
Sample codes for my CUDA programming book
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Learn CUDA Programming, published by Packt
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Fuse multiple depth frames into a TSDF voxel volume.
Flash Attention in ~100 lines of CUDA (forward pass only)
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
Fast k nearest neighbor search using GPU
Code for KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
NeRFshop: Interactive Editing of Neural Radiance Fields
tutorial for writing custom pytorch cpp+cuda kernel, applied on volume rendering (NeRF)
Official implementation of "Decentralization and Acceleration Enables Large-Scale Bundle Adjustment"
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
Code for "Representing Volumetric Videos as Dynamic MLP Maps" CVPR 2023