Yanxing-Shi

Busy with learning :>)

Casey-Shi Yanxing-Shi

Busy with learning :>)

23 followers · 114 following

AMD.inc
Beijing, China

Achievements

Stars

LLM

60 repositories

NouamaneTazi / bloomz.cpp

C++ implementation for BLOOM

C 810 64 Updated May 13, 2023

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,360 87 Updated Feb 13, 2025

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,693 505 Updated Jan 21, 2025

bazingagin / npc_gzip

Code for Paper: “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

Python 1,765 157 Updated Aug 7, 2023

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 39,071 4,366 Updated Feb 13, 2025

ztxz16 / fastllm

纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行

C++ 3,376 347 Updated Feb 8, 2025

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,841 227 Updated Feb 13, 2025

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 18,024 2,198 Updated Aug 6, 2024

beyondguo / LLM-Tuning

Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.

HTML 984 99 Updated Apr 27, 2024

LinkSoul-AI / Chinese-LLaVA

支持中英文双语视觉-文本对话的开源可商用多模态模型。

Python 362 32 Updated Sep 23, 2023

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Python 3,033 460 Updated Jul 13, 2024

turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Python 2,822 221 Updated Sep 30, 2023

Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Python 2,755 174 Updated Jan 13, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,022 164 Updated Mar 27, 2024

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 16,790 1,385 Updated Feb 1, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,727 229 Updated Feb 11, 2025

Vahe1994 / SpQR

Python 539 44 Updated Dec 16, 2024

IST-DASLab / QIGen

Repository for CPU Kernel Generation for LLM Inference

Python 25 2 Updated Jul 13, 2023

Cornell-RelaxML / QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 360 32 Updated Feb 24, 2024

FMInference / DejaVu

Python 314 39 Updated Apr 2, 2024

yxli2123 / LoSparse

Python 49 6 Updated Oct 17, 2023

horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

Python 947 111 Updated Oct 7, 2024

liyucheng09 / Selective_Context

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.

Python 355 18 Updated Feb 12, 2024

locuslab / wanda

A simple and effective LLM pruning approach.

Python 709 101 Updated Aug 9, 2024

mbzuai-nlp / LaMini-LM

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

817 49 Updated May 6, 2023

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

Python 3,840 294 Updated Jan 11, 2025

jayelm / gisting

Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467

Python 274 25 Updated Aug 5, 2023

princeton-nlp / AutoCompressors

[EMNLP 2023] Adapting Language Models to Compress Long Contexts

Python 293 22 Updated Sep 9, 2024

zhengzangw / Sequence-Scheduling

PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".

Python 81 16 Updated May 23, 2023

megvii-research / Sparsebit

A model compression and acceleration toolbox based on pytorch.

Python 329 40 Updated Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Casey-Shi Yanxing-Shi

Achievements

Achievements

Block or report Yanxing-Shi

LLM

NouamaneTazi / bloomz.cpp

HuangOwen / Awesome-LLM-Compression

AutoGPTQ / AutoGPTQ

bazingagin / npc_gzip

hpcaitech / ColossalAI

ztxz16 / fastllm

ModelTC / lightllm

karpathy / llama2.c

beyondguo / LLM-Tuning

LinkSoul-AI / Chinese-LLaVA

qwopqwop200 / GPTQ-for-LLaMa

turboderp / exllama

Alpha-VLLM / LLaMA2-Accessory

IST-DASLab / gptq

QwenLM / Qwen

mit-han-lab / llm-awq

Vahe1994 / SpQR

IST-DASLab / QIGen

Cornell-RelaxML / QuIP

FMInference / DejaVu

yxli2123 / LoSparse

horseee / LLM-Pruner

liyucheng09 / Selective_Context

locuslab / wanda

mbzuai-nlp / LaMini-LM

microsoft / LMOps

jayelm / gisting

princeton-nlp / AutoCompressors

zhengzangw / Sequence-Scheduling

megvii-research / Sparsebit