I may be slow to respond.
- China
Stars
7
stars
written in C++
Clear filter
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Transformer related optimization, including BERT, GPT
🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, FaceFusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TensorRT.
fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
用C++实现一个简单的Transformer模型。 Attention Is All You Need。