Skip to content
View zjersey's full-sized avatar

Block or report zjersey

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,801 464 Updated Mar 5, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,033 600 Updated Mar 6, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,168 774 Updated Mar 1, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,626 1,131 Updated Mar 7, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,523 6,099 Updated Mar 7, 2025

List of papers related to neural network quantization in recent AI conferences and journals.

548 45 Updated Dec 16, 2024

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…

Python 13,274 894 Updated Feb 27, 2025

🔥Highlighting the top ML papers every week.

10,917 666 Updated Feb 26, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 6,762 670 Updated Mar 4, 2025

Awesome-LLM: a curated list of Large Language Model

21,896 1,790 Updated Mar 4, 2025

Ongoing research training transformer models at scale

Python 11,662 2,617 Updated Mar 7, 2025

An open-source tool-augmented conversational language model from Fudan University

Python 12,035 1,146 Updated Jul 13, 2024

LLM inference in C/C++

C++ 75,993 10,990 Updated Mar 7, 2025

This repository contains integer operators on GPUs for PyTorch.

Python 192 50 Updated Sep 29, 2023

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,356 163 Updated Jul 12, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 37,240 4,281 Updated Mar 6, 2025

互联网仍有记忆!那些曾经在校招过程中毁过口头offer、意向书、三方的公司!纵然人微言轻,也想尽绵薄之力!

3,328 163 Updated Oct 20, 2024

润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。

31,919 2,619 Updated Jul 31, 2024

Fast inference engine for Transformer models

C++ 3,640 326 Updated Feb 25, 2025

[CVPR2022] Remember Intentions: Retrospective-Memory-based Trajectory Prediction

Python 126 16 Updated Sep 11, 2022

[CVPR22] GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction with Relational Reasoning

Python 115 25 Updated Feb 11, 2023

A Python Graph Matching Toolkit.

Python 326 20 Updated Oct 21, 2024

Transformer related optimization, including BERT, GPT

C++ 6,068 899 Updated Mar 27, 2024

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 9,961 1,771 Updated Mar 7, 2025

Low-precision matrix multiplication

C++ 1,792 455 Updated Jan 29, 2024

symmetric int8 gemm

Assembly 66 12 Updated Jun 7, 2020

row-major matmul optimization

C++ 607 84 Updated Sep 9, 2023

程序员延寿指南 | A programmer's guide to live longer

31,006 2,158 Updated Jan 30, 2024
Next