mzz12

mzz12

Stars

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,071 1,045 Updated Jan 3, 2025

irasin / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 1 Updated Nov 21, 2023

TezRomacH / layer-to-layer-pytorch

PyTorch implementation of L2L execution algorithm

Python 106 10 Updated Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mzz12

Block or report mzz12

Stars

NVIDIA / TensorRT-LLM

irasin / vllm

TezRomacH / layer-to-layer-pytorch