- Charlottesville, VA, USA
-
-
ml-retreat Public
Forked from hesamsheikh/ml-retreatMachine Learning Journal for Intermediate to Advanced Topics.
Jupyter Notebook UpdatedNov 5, 2024 -
siliwiz Public
Forked from TinyTapeout/siliwizSilicon Layout Wizard
JavaScript Other UpdatedSep 14, 2024 -
gemmini Public
Forked from ucb-bar/gemminiBerkeley's Spatial Array Generator
-
qserve Public
Forked from mit-han-lab/qserveQServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Python Apache License 2.0 UpdatedAug 13, 2024 -
H2O Public
Forked from FMInference/H2O[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Python UpdatedAug 1, 2024 -
KIVI Public
Forked from jy-yuan/KIVIKIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Python MIT License UpdatedJul 27, 2024 -
OmniQuant Public
Forked from OpenGVLab/OmniQuant[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Python MIT License UpdatedJul 24, 2024 -
-
llm-awq Public
Forked from mit-han-lab/llm-awq[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python MIT License UpdatedJul 16, 2024 -
ventus-gpgpu Public
Forked from THU-DSP-LAB/ventus-gpgpuGPGPU processor supporting RISCV-V extension, developed with Chisel HDL
Scala Other UpdatedJul 10, 2024 -
ventus-gpgpu-verilog Public
Forked from THU-DSP-LAB/ventus-gpgpu-verilogGPGPU supporting RISCV-V, developed with verilog HDL
Verilog UpdatedJul 8, 2024 -
TensorRT Public
Forked from pytorch/TensorRTPyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Python BSD 3-Clause "New" or "Revised" License UpdatedJul 5, 2024 -
TinyChatEngine Public
Forked from mit-han-lab/TinyChatEngineTinyChatEngine: On-Device LLM Inference Library
C++ MIT License UpdatedJul 4, 2024 -
Zhulong-RISCV-CPU Public
CPU Design Based on RISCV ISA
-
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Python BSD 3-Clause "New" or "Revised" License UpdatedJun 6, 2024 -
AISystem Public
Forked from chenzomi12/AISystemAISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Jupyter Notebook Apache License 2.0 UpdatedMay 22, 2024 -
hardware-accelerator-for-LLM Public
Forked from Soumya2754/hardware-accelerator-for-LLMMajor project - kannada LLM for farmers
Verilog UpdatedMay 20, 2024 -
AutoSmoothQuant Public
Forked from AniZpZ/AutoSmoothQuantAn easy-to-use package for implementing SmoothQuant for LLMs
Python MIT License UpdatedMay 18, 2024 -
basejump_stl Public
Forked from bespoke-silicon-group/basejump_stlBaseJump STL: A Standard Template Library for SystemVerilog
SystemVerilog Other UpdatedMay 14, 2024 -
spatten-llm Public
Forked from mit-han-lab/spatten[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Scala MIT License UpdatedMay 3, 2024 -
smoothquant Public
Forked from mit-han-lab/smoothquant[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python MIT License UpdatedApr 28, 2024 -
tiny-gpu Public
Forked from adam-maj/tiny-gpuA minimal GPU design in Verilog to learn how GPUs work from the ground up
-
metaseq Public
Forked from facebookresearch/metaseqRepo for external large-scale work
Python MIT License UpdatedApr 27, 2024 -
LLMsPracticalGuide Public
Forked from Mooler0410/LLMsPracticalGuideA curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
UpdatedApr 22, 2024 -
KVQuant Public
Forked from SqueezeAILab/KVQuantKVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Python UpdatedApr 19, 2024 -
FlexGen Public
Forked from FMInference/FlexLLMGenRunning large language models on a single GPU for throughput-oriented scenarios.
Python Apache License 2.0 UpdatedApr 19, 2024 -
llama3 Public
Forked from meta-llama/llama3The official Meta Llama 3 GitHub site
Python Other UpdatedApr 19, 2024 -
llama Public
Forked from meta-llama/llamaInference code for Llama models
Python Other UpdatedApr 10, 2024 -
gpgpu-sim_distribution Public
Forked from gpgpu-sim/gpgpu-sim_distributionGPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
C++ Other UpdatedApr 8, 2024