Stars
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
irasin / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
PyTorch implementation of L2L execution algorithm