Skip to content
View LittleQili's full-sized avatar
🎵
Hope music's always there with you.
🎵
Hope music's always there with you.
  • Shanghai Jiao Tong University
  • Shanghai, China

Highlights

  • Pro

Organizations

@SJTU-CSE

Block or report LittleQili

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

NVIDIA Linux open GPU with P2P support

C 972 91 Updated Dec 18, 2024

NCCL Profiling Kit

Python 123 12 Updated Jul 1, 2024

oneAPI Collective Communications Library (oneCCL)

C++ 216 72 Updated Dec 5, 2024

One second to read GitHub code with VS Code.

TypeScript 22,951 874 Updated Dec 25, 2024

Dissecting NVIDIA GPU Architecture

Cuda 83 25 Updated Jul 11, 2022

A paper list of spiking neural networks, including papers, codes, and related websites. 本仓库收集脉冲神经网络相关的顶会顶刊论文和代码,正在持续更新中。

319 28 Updated Dec 2, 2024

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

Python 342 26 Updated Nov 15, 2024

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 341 30 Updated Oct 18, 2024

面向多平台编译优化的深度学习中间表示

8 Updated Oct 28, 2024

LLM inference in C/C++

C++ 70,365 10,163 Updated Jan 8, 2025

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 698 40 Updated Jan 7, 2025
C++ 23 3 Updated Oct 31, 2023

CUDA checkpoint and restore utility

Cuda 260 13 Updated Apr 17, 2024

[ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation

Python 8 1 Updated Sep 29, 2024

Reinforcement learning environments for compiler and program optimization tasks

Python 925 130 Updated Oct 9, 2024

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 551 39 Updated Dec 31, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,772 100 Updated Jan 21, 2024

AIOS: AI Agent Operating System

Python 3,549 437 Updated Jan 8, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 279 43 Updated Jan 8, 2025

A tool for examining GPU scheduling behavior.

Cuda 71 18 Updated Aug 17, 2024

Efficient Triton Kernels for LLM Training

Python 4,131 238 Updated Jan 8, 2025

Microsoft Azure Traces

Jupyter Notebook 858 147 Updated Dec 12, 2024

Paella: Low-latency Model Serving with Virtualized GPU Scheduling

C++ 58 6 Updated May 1, 2024

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 8,557 1,506 Updated Jan 8, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 429 50 Updated Aug 19, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 137,303 27,480 Updated Jan 8, 2025

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

C++ 172 10 Updated Nov 18, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 262 22 Updated Oct 30, 2024

Scalable training and inference for Probabilistic Circuits

Python 49 7 Updated Nov 12, 2024
Next