Skip to content
View Yanxing-Shi's full-sized avatar
:octocat:
Busy with learning :>)
:octocat:
Busy with learning :>)
  • AMD.inc
  • Beijing, China

Block or report Yanxing-Shi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

LLM

60 repositories

C++ implementation for BLOOM

C 810 64 Updated May 13, 2023

Awesome LLM compression research papers and tools.

1,360 87 Updated Feb 13, 2025

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,693 505 Updated Jan 21, 2025

Code for Paper: “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

Python 1,765 157 Updated Aug 7, 2023

Making large AI models cheaper, faster and more accessible

Python 39,071 4,366 Updated Feb 13, 2025

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行

C++ 3,376 347 Updated Feb 8, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,841 227 Updated Feb 13, 2025

Inference Llama 2 in one file of pure C

C 18,024 2,198 Updated Aug 6, 2024

Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.

HTML 984 99 Updated Apr 27, 2024

支持中英文双语视觉-文本对话的开源可商用多模态模型。

Python 362 32 Updated Sep 23, 2023

4 bits quantization of LLaMA using GPTQ

Python 3,033 460 Updated Jul 13, 2024

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Python 2,822 221 Updated Sep 30, 2023

An Open-source Toolkit for LLM Development

Python 2,755 174 Updated Jan 13, 2025

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,022 164 Updated Mar 27, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 16,790 1,385 Updated Feb 1, 2025

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,727 229 Updated Feb 11, 2025
Python 539 44 Updated Dec 16, 2024

Repository for CPU Kernel Generation for LLM Inference

Python 25 2 Updated Jul 13, 2023

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 360 32 Updated Feb 24, 2024
Python 314 39 Updated Apr 2, 2024
Python 49 6 Updated Oct 17, 2023

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

Python 947 111 Updated Oct 7, 2024

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.

Python 355 18 Updated Feb 12, 2024

A simple and effective LLM pruning approach.

Python 709 101 Updated Aug 9, 2024

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

817 49 Updated May 6, 2023

General technology for enabling AI capabilities w/ LLMs and MLLMs

Python 3,840 294 Updated Jan 11, 2025

Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467

Python 274 25 Updated Aug 5, 2023

[EMNLP 2023] Adapting Language Models to Compress Long Contexts

Python 293 22 Updated Sep 9, 2024

PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".

Python 81 16 Updated May 23, 2023

A model compression and acceleration toolbox based on pytorch.

Python 329 40 Updated Jan 12, 2024