Stars
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Implementation of popular deep learning networks with TensorRT network definition API
Simple samples for TensorRT programming
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Simple, safe way to store and distribute tensors
SGLang is a fast serving framework for large language models and vision language models.
Fast and memory-efficient exact attention
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
State-of-the-art 2D and 3D Face Analysis Project
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
OneDiff: An out-of-the-box acceleration library for diffusion models.
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
Seamless operability between C++11 and Python
A template matching library based on OpenCV, supporting rotation matching, cross-platform usage, C++, and Python. 基于opencv的模板匹配库,支持旋转匹配,支持跨平台、c++调用、python调用
Open standard for machine learning interoperability
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
Serve, optimize and scale PyTorch models in production
A high-throughput and memory-efficient inference and serving engine for LLMs
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, 讯飞星火, 文心一言 and more, discover the best answers