- Shanghai
Stars
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
A high-throughput and memory-efficient inference and serving engine for LLMs
List of papers related to neural network quantization in recent AI conferences and journals.
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
🔥Highlighting the top ML papers every week.
Accessible large language models via k-bit quantization for PyTorch.
Awesome-LLM: a curated list of Large Language Model
Ongoing research training transformer models at scale
An open-source tool-augmented conversational language model from Fudan University
This repository contains integer operators on GPUs for PyTorch.
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
互联网仍有记忆!那些曾经在校招过程中毁过口头offer、意向书、三方的公司!纵然人微言轻,也想尽绵薄之力!
润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。
Fast inference engine for Transformer models
[CVPR2022] Remember Intentions: Retrospective-Memory-based Trajectory Prediction
[CVPR22] GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction with Relational Reasoning
Transformer related optimization, including BERT, GPT
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
程序员延寿指南 | A programmer's guide to live longer
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
High-efficiency floating-point neural network inference operators for mobile, server, and Web
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.