Skip to content
View snowpeakz's full-sized avatar

Block or report snowpeakz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 829 55 Updated Apr 2, 2025

UltraScale Playbook 中文版

Python 31 3 Updated Mar 15, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,963 233 Updated Mar 4, 2025

Expert Parallelism Load Balancer

Python 1,122 180 Updated Mar 24, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,693 284 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,138 539 Updated Apr 3, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,359 693 Updated Apr 3, 2025

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels, FA2, HGEMM via MMA and CuTe (~99% TFLOPS of cuBLAS/FA2 🎉).

Cuda 3,177 337 Updated Apr 1, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 12,819 1,426 Updated Apr 3, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,972 192 Updated Apr 3, 2025

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 489 62 Updated Feb 20, 2025

C++ builds C++

C++ 24 Updated Nov 14, 2024

A Lightweight Recommendation System

Python 8,728 673 Updated Nov 8, 2023

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda 611 141 Updated Mar 26, 2025

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,058 533 Updated Apr 3, 2025

CUDA Core Compute Libraries

C++ 1,583 205 Updated Apr 3, 2025
C++ 530 90 Updated Mar 22, 2025

Material for gpu-mode lectures

Jupyter Notebook 4,167 418 Updated Feb 9, 2025

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 141 27 Updated Mar 30, 2025

cuDF - GPU DataFrame Library

C++ 8,838 942 Updated Apr 3, 2025

compiler learning resources collect.

Python 2,344 345 Updated Mar 19, 2025

how to learn PyTorch and OneFlow

420 26 Updated Mar 22, 2024

how to optimize some algorithm in cuda.

Cuda 2,067 184 Updated Apr 3, 2025

Fast and memory-efficient exact attention

Python 16,688 1,584 Updated Apr 1, 2025

Awesome-LLM: a curated list of Large Language Model

22,534 1,862 Updated Mar 26, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,217 1,185 Updated Apr 3, 2025

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021

Python 56 12 Updated Jul 21, 2021

Alluxio, data orchestration for analytics and machine learning in the cloud

Java 6,966 2,946 Updated Mar 25, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 142,378 28,511 Updated Apr 3, 2025
Next