Pruning
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
Prune a model while finetuning or training.
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
A curated list for Efficient Large Language Models
Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
BlockDrop: Dynamic Inference Paths in Residual Networks
Collection of recent methods on (deep) neural network compression and acceleration.
GNN-RL Compression: Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning
RL-Pruner: Structured Pruning Using Reinforcement Learning for CNN Compression and Acceleration