Stars
SGLang is a fast serving framework for large language models and vision language models.
FlashInfer: Kernel Library for LLM Serving
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A framework for few-shot evaluation of language models.
The Paper List on Data Contamination for Large Language Models Evaluation.
A high-throughput and memory-efficient inference and serving engine for LLMs
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A fast communication-overlapping library for tensor parallelism on GPUs.
Clspv is a compiler for OpenCL C to Vulkan compute shaders
chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Online compiler for HIP and NVIDIA® CUDA® code to WebGPU
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Stretching GPU performance for GEMMs and tensor contractions.
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.