-
InfiniGen Public
Forked from snu-comparch/InfiniGenInfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
Python Apache License 2.0 UpdatedOct 10, 2024 -
cold-compress Public
Forked from AnswerDotAI/cold-compressCold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
Python BSD 3-Clause "New" or "Revised" License UpdatedSep 25, 2024 -
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Python BSD 3-Clause "New" or "Revised" License UpdatedAug 30, 2024 -
H2O Public
Forked from FMInference/H2O[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Python UpdatedJul 11, 2024 -
-
-
Legion Public
Forked from RC4ML/LegionGPU-initiated Large-scale GNN System
Cuda UpdatedMay 8, 2024 -
-
-
-
PuLP Public
Forked from HPCGraphAnalysis/PuLPC++ BSD 3-Clause "New" or "Revised" License UpdatedOct 8, 2023 -
-
-
DUCATI_SIGMOD Public
Forked from initzhang/DUCATI_SIGMODAccepted paper of SIGMOD 2023, DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU
Python UpdatedJun 4, 2023 -
-
USTC-CS-Resources Public
Forked from USTC-CS-Course-Resource/USTC-CS-ResourcesUSTC计算机学院 个人学习资料分享
Python MIT License UpdatedJul 4, 2021 -