-
Shanghai Jiao Tong University
- Shanghai, China
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
NVIDIA Linux open GPU with P2P support
oneAPI Collective Communications Library (oneCCL)
One second to read GitHub code with VS Code.
Dissecting NVIDIA GPU Architecture
A paper list of spiking neural networks, including papers, codes, and related websites. 本仓库收集脉冲神经网络相关的顶会顶刊论文和代码,正在持续更新中。
torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.
A tool for bandwidth measurements on NVIDIA GPUs.
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
[ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation
Reinforcement learning environments for compiler and program optimization tasks
VPTQ, A Flexible and Extreme low-bit quantization algorithm
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
MSCCL++: A GPU-driven communication stack for scalable AI applications
A tool for examining GPU scheduling behavior.
Efficient Triton Kernels for LLM Training
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Disaggregated serving system for Large Language Models (LLMs).
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
A fast communication-overlapping library for tensor parallelism on GPUs.
Scalable training and inference for Probabilistic Circuits