-
Nanjing University
- Nanjing, China
-
22:53
(UTC +01:00)
Highlights
- Pro
Starred repositories
Model LLM inference on single-core hardware architectures
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024
A unified simulation platform that combines hardware and software, enabling pre-silicon, full-stack, closed-loop evaluation of your robotic system.
A heterogeneous accelerator-centric compute cluster
Fast and accurate DRAM power and energy estimation tool
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
A Python package that uses task-based neurons to build neural networks.
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Awesome-LLM: a curated list of Large Language Model
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Algebraic enhancements for deep learning accelerator architectures
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
The official GitHub page for the survey paper "A Survey of Large Language Models".
Open-source artifacts and codes of our MICRO'23 paper titled “Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads”.
Implementation of "NITI: Training Integer Neural Networks Using Integer-only Arithmetic" on arxiv
Comparison of method "Pruning at initialization prior to training" (Synflow/SNIP/GraSP) in PyTorch
Universal LLM Deployment Engine with ML Compilation
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Open-Source Posit RISC-V Core with Quire Capability
A framework for fast exploration of the depth-first scheduling space for DNN accelerators
这是一款提高ChatGPT的数据安全能力和效率的插件。并且免费共享大量创新功能,如:自动刷新、保持活跃、数据安全、取消审计、克隆对话、言无不尽、净化页面、展示大屏、拦截跟踪、日新月异、明察秋毫等。让我们的AI体验无比安全、顺畅、丝滑、高效、简洁。
HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators
A collection of research papers on efficient training of DNNs
VSCode插件:自动生成,自动更新VSCode文件头部注释, 自动生成函数注释并支持提取函数参数,支持所有主流语言,文档齐全,使用简单,配置灵活方便,持续维护多年。
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers