Awesome-LLM-Compression

Awesome LLM compression research papers and tools to accelerate the LLM training and inference.

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
NeurIPS 2022 [Paper] [Code]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
NeurIPS 2022 [Paper] [Code]
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Arxiv 2022 [Paper]
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Arxiv 2023 [Paper]
Quantized Distributed Training of Large Models with Convergence Guarantees
Arxiv 2023 [Paper]
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
ICML 2023 [Paper] [Code]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
ICLR 2023 [Paper] [Code]
RPTQ: Reorder-based Post-training Quantization for Large Language Models
Arxiv 2023 [Paper] [Code]
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Arxiv 2023 [Paper] [Code]
QLoRA: Efficient Finetuning of Quantized LLMs
Arxiv 2023 [Paper] [Code]
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Arxiv 2023 [Paper]
The Quantization Model of Neural Scaling
Arxiv 2023 [Paper]
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Arxiv 2023 [Paper]
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Arxiv 2023 [Paper] [Code]
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Arxiv 2023 [Paper]
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Arxiv 2023 [Paper] [Code]
OWQ: Lessons learned from activation outliers for weight quantization in large language models
Arxiv 2023 [Paper]

Pruning/Sparsity

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
ICLR 2023 [Paper]
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Arxiv 2023 [Paper] [Code]
LLM-Pruner: On the Structural Pruning of Large Language Models
Arxiv 2023 [Paper] [Code]
Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models
ICLR 2023 TinyPapers [Paper]
Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering
Arxiv 2023 [Paper] [Code]
Learning to Compress Prompts with Gist Tokens
Arxiv 2023 [Paper] [Code]
Efficient Prompting via Dynamic In-Context Learning
Arxiv 2023 [Paper]

Distillation

Lifting the Curse of Capacity Gap in Distilling Language Models
ACL 2023 [Paper] [Code]
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
ACL 2023 [Paper]
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Arxiv 2023 [Paper] [Code]
Large Language Model Distillation Doesn't Need a Teacher
Arxiv 2023 [Paper] [Code]
The False Promise of Imitating Proprietary LLMs
Arxiv 2023 [Paper]
GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
Arxiv 2023 [Paper] [Code]
PaD: Program-aided Distillation Specializes Large Models in Reasoning
Arxiv 2023 [Paper]

Tools

BMCook: Model Compression for Big Models [Code]
llama.cpp: Inference of LLaMA model in pure C/C++ [Code]
LangChain: Building applications with LLMs through composability [Code]
GPTQ-for-LLaMA: 4 bits quantization of LLaMA using GPTQ [Code]
Alpaca-CoT: An Instruction Fine-Tuning Platform with Instruction Data Collection and Unified Large Language Models Interface [Code]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-LLM-Compression

Contents

Papers

Quantization

Pruning/Sparsity

Distillation

Tools

About

Releases

Packages

License

MichaelZhouwang/Awesome-LLM-Compression

Folders and files

Latest commit

History

Repository files navigation

Awesome-LLM-Compression

Contents

Papers

Quantization

Pruning/Sparsity

Distillation

Tools

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages