I may be slow to respond.
- China
Stars
7
stars
written in C++
Clear filter
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Transformer related optimization, including BERT, GPT
🛠 A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. 🎉🎉
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
用C++实现一个简单的Transformer模型。 Attention Is All You Need。