Lists (1)
Sort Name ascending (A-Z)
Stars
Development repository for the Triton language and compiler
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
High performance server-side application framework
An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
Lightning fast C++/CUDA neural network framework
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Curve is a sandbox project hosted by the CNCF Foundation. It's cloud-native, high-performance, and easy to operate. Curve is an open-source distributed storage system for block and shared file stor…
A lightning fast Finite State machine and REgular expression manipulation library.
Simple, light-weight and easy-to-use asynchronous components
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
A collection of modern C++ libraries, include coro_rpc, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Probably the fastest coroutine lib in the world!
A highly optimized LLM inference acceleration engine for Llama and its variants.
Simple, portable, and self-contained stacktrace library for C++11 and newer
A performant and modular runtime for TensorFlow
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA