Skip to content
View HUSTHY's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report HUSTHY

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
7 stars written in C++
Clear filter

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,537 1,444 Updated May 23, 2025

Transformer related optimization, including BERT, GPT

C++ 6,165 905 Updated Mar 27, 2024

🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, FaceFusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TensorRT.

C++ 4,098 739 Updated Apr 28, 2025

fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。

C++ 3,572 365 Updated May 20, 2025

Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL

C++ 544 85 Updated Nov 18, 2020

用C++实现一个简单的Transformer模型。 Attention Is All You Need。

C++ 49 8 Updated Mar 11, 2021
C++ 2 Updated Apr 23, 2020