Skip to content
View zjersey's full-sized avatar

Block or report zjersey

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Development repository for the Triton language and compiler

MLIR 15,645 1,990 Updated May 23, 2025

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,117 74 Updated May 22, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,368 598 Updated May 20, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,680 772 Updated May 23, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,564 835 Updated Apr 29, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,537 1,444 Updated May 23, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 47,887 7,558 Updated May 23, 2025

List of papers related to neural network quantization in recent AI conferences and journals.

630 49 Updated Mar 27, 2025

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…

Python 13,627 915 Updated May 22, 2025

🔥Highlighting the top ML papers every week.

11,271 686 Updated Apr 11, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 7,057 698 Updated May 22, 2025

Awesome-LLM: a curated list of Large Language Model

23,452 1,962 Updated May 9, 2025

Ongoing research training transformer models at scale

Python 12,403 2,782 Updated May 22, 2025

An open-source tool-augmented conversational language model from Fudan University

Python 12,049 1,147 Updated Jul 13, 2024

LLM inference in C/C++

C++ 80,719 11,873 Updated May 23, 2025

This repository contains integer operators on GPUs for PyTorch.

Python 205 54 Updated Sep 29, 2023

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,409 174 Updated Jul 12, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 38,510 4,383 Updated May 23, 2025

互联网仍有记忆!那些曾经在校招过程中毁过口头offer、意向书、三方的公司!纵然人微言轻,也想尽绵薄之力!

3,334 160 Updated Oct 20, 2024

润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。

32,045 2,615 Updated Jul 31, 2024

Fast inference engine for Transformer models

C++ 3,812 359 Updated Apr 8, 2025

[CVPR2022] Remember Intentions: Retrospective-Memory-based Trajectory Prediction

Python 129 16 Updated Sep 11, 2022

[CVPR22] GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction with Relational Reasoning

Python 122 26 Updated Feb 11, 2023

A Python Graph Matching Toolkit.

Python 332 19 Updated Oct 21, 2024

Transformer related optimization, including BERT, GPT

C++ 6,165 905 Updated Mar 27, 2024

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 11,040 1,861 Updated May 16, 2025

Low-precision matrix multiplication

C++ 1,803 458 Updated Jan 29, 2024

symmetric int8 gemm

Assembly 66 12 Updated Jun 7, 2020
Next