snowpeakz

Follow

Xuefeng Zhu snowpeakz

Follow

fly me to the moon

12 followers · 34 following

Achievements

Achievements

Starred repositories

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 829 55 Updated Apr 2, 2025

pprp / ultrascale-playbook-zh

UltraScale Playbook 中文版

Python 31 3 Updated Mar 15, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,963 233 Updated Mar 4, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,122 180 Updated Mar 24, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,693 284 Updated Mar 10, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,138 539 Updated Apr 3, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,359 693 Updated Apr 3, 2025

xlite-dev / CUDA-Learn-Notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels, FA2, HGEMM via MMA and CuTe (~99% TFLOPS of cuBLAS/FA2 🎉).

Cuda 3,177 337 Updated Apr 1, 2025

deepseek-ai / DeepSeek-V3

Python 94,991 15,375 Updated Mar 16, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 12,819 1,426 Updated Apr 3, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,972 192 Updated Apr 3, 2025

NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 489 62 Updated Feb 20, 2025

bacoo / zmake

C++ builds C++

C++ 24 Updated Nov 14, 2024

bytedance / monolith

A Lightweight Recommendation System

Python 8,728 673 Updated Nov 8, 2023

tensorflow / recommenders-addons

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda 611 141 Updated Mar 26, 2025

openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,058 533 Updated Apr 3, 2025

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 1,583 205 Updated Apr 3, 2025

NVIDIA / cuCollections

C++ 530 90 Updated Mar 22, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 4,167 418 Updated Feb 9, 2025

NVIDIA-Merlin / HierarchicalKV

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 141 27 Updated Mar 30, 2025

rapidsai / cudf

cuDF - GPU DataFrame Library

C++ 8,838 942 Updated Apr 3, 2025

BBuf / tvm_mlir_learn

compiler learning resources collect.

Python 2,344 345 Updated Mar 19, 2025

BBuf / how-to-learn-deep-learning-framework

how to learn PyTorch and OneFlow

420 26 Updated Mar 22, 2024

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,067 184 Updated Apr 3, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 16,688 1,584 Updated Apr 1, 2025

Hannibal046 / Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

22,534 1,862 Updated Mar 26, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 7,217 1,185 Updated Apr 3, 2025

Distributed-AI / PipeTransformer

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021

Python 56 12 Updated Jul 21, 2021

Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Java 6,966 2,946 Updated Mar 25, 2025

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 142,378 28,511 Updated Apr 3, 2025

Starred topics

Ubuntu

Tensorflow

Python

Linux

Docker

Database

Deep learning

Data visualization

Data structures

C++