Lists (1)
Sort Name ascending (A-Z)
Stars
Linux Device Drivers 3 examples updated to work in recent kernels
FlashInfer: Kernel Library for LLM Serving
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Script that organizes the Google Takeout archive into one big chronological folder
A large-scale simulation framework for LLM inference
Documentation of NVIDIA chip/hardware interfaces
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
Tengine is a lite, high performance, modular inference engine for embedded device
Typescript based SystemVerilog meta-language and HDL design framework
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
Unified compiler/runtime for interfacing with PyTorch Dynamo.
Shared Middle-Layer for Triton Compilation
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Stable Diffusion and Flux in pure C/C++