Skip to content
View wx-csy's full-sized avatar
👋
bonjour
👋
bonjour
  • Tsinghua University
  • Beijing, China

Highlights

  • Pro

Organizations

@nju-calabash

Block or report wx-csy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,585 851 Updated Apr 4, 2025

Expert Parallelism Load Balancer

Python 1,140 187 Updated Mar 24, 2025

Analyze computation-communication overlap in V3/R1.

994 141 Updated Mar 21, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,721 289 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,213 562 Updated Apr 16, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,436 709 Updated Apr 16, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,439 821 Updated Mar 1, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,594 264 Updated Apr 14, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,738 104 Updated Apr 3, 2025

NVIDIA Linux open GPU with P2P support

C 1,081 105 Updated Dec 18, 2024

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,819 194 Updated Aug 17, 2024

Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks

Makefile 93 19 Updated Sep 2, 2021

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,042 157 Updated Mar 26, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 13,568 947 Updated Apr 16, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,047 206 Updated Apr 16, 2025

how to optimize some algorithm in cuda.

Cuda 2,105 187 Updated Apr 14, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,293 1,196 Updated Apr 10, 2025

Large Language Model Text Generation Inference

Python 10,012 1,181 Updated Apr 16, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 45,004 6,894 Updated Apr 16, 2025

Transformer related optimization, including BERT, GPT

C++ 6,123 901 Updated Mar 27, 2024

Inference code for Llama models

Python 58,103 9,739 Updated Jan 26, 2025

Intel® Performance Counter Monitor (Intel® PCM)

C++ 2,958 484 Updated Apr 16, 2025

Grasper: A High Performance Distributed System for OLAP on Property Graphs.

C++ 31 9 Updated Apr 3, 2021

A solver for subgraph isomorphism problems, based upon a series of papers by subsets of McCreesh, Prosser, and Trimble.

C++ 77 24 Updated Feb 21, 2025

CP 2015 subgraph isomorphism experiments, data and paper

C++ 13 5 Updated Sep 5, 2015
C 524 93 Updated Mar 17, 2025

Graph Pattern Mining

C++ 88 18 Updated Sep 20, 2024

PLCT实验室的公开演讲,或者决定公开的组内报告

1,071 158 Updated Dec 12, 2024

Open-source graph database, tuned for dynamic analytics environments. Easy to adopt, scale and own.

C++ 2,750 143 Updated Apr 16, 2025
Next