📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,664 182 Updated Oct 15, 2024

TheLastGimbus / GooglePhotosTakeoutHelper

Script that organizes the Google Takeout archive into one big chronological folder

Dart 3,916 194 Updated Aug 12, 2024

microsoft / vidur

A large-scale simulation framework for LLM inference

Python 253 33 Updated Oct 10, 2024

master-hax / pixel-backup-gang

backup tools for OG pixel & pixel XL

Shell 44 6 Updated Sep 18, 2024

NVIDIA / open-gpu-doc

Documentation of NVIDIA chip/hardware interfaces

C 1,247 91 Updated Sep 10, 2024

olcf / cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 581 222 Updated Aug 19, 2024

Jokeren / Awesome-GPU

Awesome resources for GPUs

480 48 Updated Jul 1, 2023

nintyconservation9619 / nintyconservation9619.github.io

HTML 16 13 Updated Jun 10, 2019

j2kun / mlir-tutorial

MLIR For Beginners tutorial

C++ 781 63 Updated Sep 30, 2024

YKTian-x2b / AmpereArchViaMicroBenchMark

通过MicroBenchmark获悉Ampere微架构知识

C++ 4 Updated Jan 11, 2024

aws-neuron / neuronx-distributed

Python 40 6 Updated Oct 4, 2024

OpenDriveLab / UniAD

[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving

Python 3,429 382 Updated Aug 28, 2024

OAID / Tengine

Tengine is a lite, high performance, modular inference engine for embedded device

C++ 4,630 997 Updated Sep 15, 2024

TypeScriptSystemVerilog / TSSV

Typescript based SystemVerilog meta-language and HDL design framework

Verilog 3 5 Updated Sep 6, 2024

Cambricon / mlu-ops

Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .

C++ 103 103 Updated Oct 17, 2024

feifeibear / LLMRoofline

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 73 3 Updated Mar 13, 2024

vosen / ZLUDA

CUDA on non-NVIDIA GPUs

Rust 9,467 623 Updated Oct 16, 2024

ROCm / clr

C++ 93 47 Updated Oct 17, 2024

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,195 253 Updated Oct 17, 2024

chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,167 71 Updated Jul 16, 2024

tenstorrent / tt-metal

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

C++ 434 62 Updated Oct 17, 2024

QianyanTech / NBAssembler

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

Python 66 8 Updated Feb 23, 2023

nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.

Python 93 48 Updated Oct 17, 2024

microsoft / triton-shared

Shared Middle-Layer for Triton Compilation

MLIR 173 37 Updated Oct 15, 2024

NervanaSystems / maxas

Assembler for NVIDIA Maxwell architecture

Sass 945 161 Updated Jan 3, 2023

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 83,021 22,385 Updated Oct 17, 2024

leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++

C++ 3,372 285 Updated Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skewer

Achievements