Skip to content
View fantasysee's full-sized avatar
🍀
🍀
  • Nanjing University
  • Nanjing, China
  • 22:53 (UTC +01:00)

Highlights

  • Pro

Organizations

@KULeuven-MICAS

Block or report fantasysee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Model LLM inference on single-core hardware architectures

Python 6 Updated Aug 23, 2024

Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024

Python 59 3 Updated Dec 31, 2024

A unified simulation platform that combines hardware and software, enabling pre-silicon, full-stack, closed-loop evaluation of your robotic system.

Python 36 4 Updated Sep 27, 2024

A heterogeneous accelerator-centric compute cluster

SystemVerilog 12 10 Updated Jan 15, 2025

Fast and accurate DRAM power and energy estimation tool

C++ 143 50 Updated Jan 15, 2025

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM

Python 152 14 Updated Jul 12, 2024

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 262 25 Updated Oct 10, 2024

A Python package that uses task-based neurons to build neural networks.

Python 133 3 Updated Aug 22, 2024

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 753 58 Updated Oct 8, 2024
C 12 1 Updated Nov 11, 2024

Awesome-LLM: a curated list of Large Language Model

20,628 1,690 Updated Jan 13, 2025

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 8,050 418 Updated Sep 6, 2024

Algebraic enhancements for deep learning accelerator architectures

Python 264 15 Updated Mar 28, 2024

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

Python 341 23 Updated Mar 21, 2024

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 10,802 844 Updated Aug 20, 2024

Open-source artifacts and codes of our MICRO'23 paper titled “Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads”.

Python 34 Updated Sep 18, 2023

Implementation of "NITI: Training Integer Neural Networks Using Integer-only Arithmetic" on arxiv

C++ 79 15 Updated Jul 26, 2022

Comparison of method "Pruning at initialization prior to training" (Synflow/SNIP/GraSP) in PyTorch

Python 14 2 Updated May 12, 2024
Jupyter Notebook 125 8 Updated Oct 17, 2024

Universal LLM Deployment Engine with ML Compilation

Python 19,628 1,607 Updated Jan 14, 2025

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,997 162 Updated Mar 27, 2024

Open-Source Posit RISC-V Core with Quire Capability

C++ 51 12 Updated Nov 19, 2024

A framework for fast exploration of the depth-first scheduling space for DNN accelerators

Python 35 10 Updated Feb 8, 2023

这是一款提高ChatGPT的数据安全能力和效率的插件。并且免费共享大量创新功能,如:自动刷新、保持活跃、数据安全、取消审计、克隆对话、言无不尽、净化页面、展示大屏、拦截跟踪、日新月异、明察秋毫等。让我们的AI体验无比安全、顺畅、丝滑、高效、简洁。

JavaScript 14,754 733 Updated Oct 13, 2024

HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators

C++ 123 44 Updated Jan 14, 2025

A collection of research papers on efficient training of DNNs

69 8 Updated Jul 6, 2022

VSCode插件:自动生成,自动更新VSCode文件头部注释, 自动生成函数注释并支持提取函数参数,支持所有主流语言,文档齐全,使用简单,配置灵活方便,持续维护多年。

JavaScript 5,739 279 Updated Apr 19, 2023

[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers

Python 177 27 Updated Feb 28, 2023
Next