![:octocat: :octocat:](https://github.githubassets.com/images/icons/emoji/octocat.png)
-
AMD.inc
- Beijing, China
LLM
Awesome LLM compression research papers and tools.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Code for Paper: “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
Making large AI models cheaper, faster and more accessible
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.
4 bits quantization of LLaMA using GPTQ
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
An Open-source Toolkit for LLM Development
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Repository for CPU Kernel Generation for LLM Inference
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
General technology for enabling AI capabilities w/ LLMs and MLLMs
Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
A model compression and acceleration toolbox based on pytorch.