Skip to content
View xie-1399's full-sized avatar
  • BUAA
  • 中国

Block or report xie-1399

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,185 252 Updated Oct 9, 2024

A dual clock asynchronous FIFO written in verilog, tested with Icarus Verilog

Verilog 251 75 Updated Apr 30, 2024

Fast inference from large lauguage models via speculative decoding

Python 532 52 Updated Aug 22, 2024

Inference Llama 2 in one file of pure C

C 17,279 2,057 Updated Aug 6, 2024

Low Precision Arithmetic Simulation in PyTorch

Python 263 75 Updated May 20, 2024

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Python 8,787 1,974 Updated Apr 16, 2024

📰 Must-read papers and blogs on Speculative Decoding ⚡️

384 15 Updated Oct 9, 2024

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

61 1 Updated Oct 7, 2024

Verilog implementation of Softmax function

Verilog 47 16 Updated Jul 27, 2022

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Python 10,875 1,069 Updated Sep 28, 2024

利用HuggingFace的官方下载工具从镜像网站进行高速下载。

Python 768 71 Updated Sep 5, 2024

This repository contains demos I made with the Transformers library by HuggingFace.

Jupyter Notebook 9,196 1,418 Updated Aug 8, 2024

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,624 413 Updated Oct 10, 2024

Run generative AI models in sophgo BM1684X

Python 107 17 Updated Oct 9, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 28,080 4,149 Updated Oct 9, 2024

PyTorch Tutorial for Deep Learning Researchers

Python 29,957 8,104 Updated Aug 15, 2023

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,403 185 Updated Jul 16, 2024

Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)

Python 9 1 Updated Jul 4, 2024
Python 77 14 Updated Nov 17, 2023

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 10,151 801 Updated Aug 20, 2024

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 10,310 1,021 Updated Oct 9, 2024

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 5,970 516 Updated Sep 6, 2024

Inference code for Llama models

Python 55,925 9,518 Updated Aug 18, 2024

MLIR For Beginners tutorial

C++ 761 62 Updated Sep 30, 2024

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 749 316 Updated Oct 9, 2024

A collection of pre-trained, state-of-the-art models in the ONNX format

Jupyter Notebook 7,802 1,394 Updated Apr 30, 2024

SpinalHDL Hardware Math Library

Scala 77 13 Updated Jul 12, 2024

Intermediate Language (IL) for Hardware Accelerator Generators

Rust 490 50 Updated Oct 9, 2024

Dive into Deep Learning Compiler

Python 640 98 Updated Jun 19, 2022
Next