
Lists (5)
Sort Name ascending (A-Z)
Starred repositories
🔥 公益免费的ChatGPT API,Free ChatGPT API,GPT4 API,可直连,无需代理,使用标准 OpenAI APIKEY 格式访问 ChatGPT,可搭配ChatGPT-next-web、ChatGPT-Midjourney、Lobe-chat、Botgem、FastGPT、沉浸式翻译等项目使用
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Fully open reproduction of DeepSeek-R1
🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation
Code and documentation to train Stanford's Alpaca models, and generate the data.
PyTorch library for cost-effective, fast and easy serving of MoE models.
Ongoing research training gaussian splatting at scale by distributed system
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Kratos: An FPGA Benchmark for Unrolled Deep Neural Networks with Fine-Grained Sparsity and Mixed Precision
[TMLR 2024] Efficient Large Language Models: A Survey
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
An implementation of OBC algorithm packed into a module
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
A paper list of some recent works about Token Compress for Vit and VLM
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning