Skip to content
View zjersey's full-sized avatar

Block or report zjersey

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,166 1,070 Updated Jan 16, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,962 5,216 Updated Jan 19, 2025

List of papers related to neural network quantization in recent AI conferences and journals.

508 40 Updated Dec 16, 2024

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…

Python 13,013 882 Updated Jan 10, 2025

🔥Highlighting the top ML papers every week.

10,710 645 Updated Jan 1, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 6,528 649 Updated Jan 14, 2025

Awesome-LLM: a curated list of Large Language Model

20,670 1,688 Updated Jan 13, 2025

Ongoing research training transformer models at scale

Python 11,122 2,488 Updated Jan 18, 2025

An open-source tool-augmented conversational language model from Fudan University

Python 12,019 1,147 Updated Jul 13, 2024

LLM inference in C/C++

C++ 70,889 10,259 Updated Jan 18, 2025

This repository contains integer operators on GPUs for PyTorch.

Python 189 50 Updated Sep 29, 2023

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,316 155 Updated Jul 12, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 36,278 4,201 Updated Jan 18, 2025

互联网仍有记忆!那些曾经在校招过程中毁过口头offer、意向书、三方的公司!纵然人微言轻,也想尽绵薄之力!

3,322 162 Updated Oct 20, 2024

润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。

31,856 2,631 Updated Jul 31, 2024

Fast inference engine for Transformer models

C++ 3,530 313 Updated Dec 18, 2024

[CVPR2022] Remember Intentions: Retrospective-Memory-based Trajectory Prediction

Python 125 16 Updated Sep 11, 2022

[CVPR22] GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction with Relational Reasoning

Python 114 23 Updated Feb 11, 2023

A Python Graph Matching Toolkit.

Python 320 19 Updated Oct 21, 2024

Transformer related optimization, including BERT, GPT

C++ 5,983 894 Updated Mar 27, 2024

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

C++ 8,929 1,695 Updated Jan 13, 2025

Low-precision matrix multiplication

C++ 1,787 454 Updated Jan 29, 2024

symmetric int8 gemm

Assembly 66 12 Updated Jun 7, 2020

row-major matmul optimization

C++ 600 81 Updated Sep 9, 2023

程序员延寿指南 | A programmer's guide to live longer

30,849 2,146 Updated Jan 30, 2024

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

C++ 4,983 822 Updated Jun 17, 2024

ncnn is a high-performance neural network inference framework optimized for the mobile platform

C++ 20,827 4,197 Updated Jan 7, 2025

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 1,925 381 Updated Jan 17, 2025

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

C++ 2,899 785 Updated Dec 19, 2024

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 6,527 1,522 Updated Jan 18, 2025
Next