Skip to content
View zjnyly's full-sized avatar

Block or report zjnyly

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

🔥 公益免费的ChatGPT API,Free ChatGPT API,GPT4 API,可直连,无需代理,使用标准 OpenAI APIKEY 格式访问 ChatGPT,可搭配ChatGPT-next-web、ChatGPT-Midjourney、Lobe-chat、Botgem、FastGPT、沉浸式翻译等项目使用

3,750 395 Updated Nov 12, 2024

AMD OpenNIC Project Overview

Shell 243 41 Updated Dec 20, 2022
Python 24 2 Updated Feb 26, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,450 389 Updated Feb 28, 2025

Fully open reproduction of DeepSeek-R1

Python 21,799 1,931 Updated Mar 1, 2025

🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation

Python 6,198 794 Updated Feb 10, 2025
C 23 4 Updated Dec 10, 2024

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,853 4,062 Updated Jul 17, 2024

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python 138 12 Updated Feb 27, 2025

Ongoing research training gaussian splatting at scale by distributed system

Python 469 29 Updated Aug 9, 2024

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,723 502 Updated Feb 27, 2025

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,507 264 Updated Jan 16, 2024

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 192 14 Updated Dec 16, 2024

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 530 39 Updated Feb 14, 2025

Kratos: An FPGA Benchmark for Unrolled Deep Neural Networks with Fine-Grained Sparsity and Mixed Precision

Python 9 2 Updated Jul 25, 2024

[TMLR 2024] Efficient Large Language Models: A Survey

1,104 95 Updated Feb 27, 2025

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

Python 51 8 Updated Feb 8, 2025
Python 311 40 Updated Apr 2, 2024

The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".

Python 18 1 Updated Nov 12, 2024
Python 174 9 Updated Feb 21, 2025

An implementation of OBC algorithm packed into a module

Python 8 Updated Dec 28, 2023

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Python 678 43 Updated Aug 13, 2024

[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

Python 32 Updated Dec 6, 2023

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Python 110 21 Updated Jul 10, 2024

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行

C++ 3,396 348 Updated Feb 27, 2025

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

Python 48 1 Updated Dec 14, 2024

A paper list of some recent works about Token Compress for Vit and VLM

338 17 Updated Feb 9, 2025

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Scala 82 8 Updated Aug 27, 2024
Next