-
myQuaRot Public
Forked from spcl/QuaRotCode for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
Python Apache License 2.0 UpdatedJan 25, 2025 -
-
lmquant Public
Forked from mit-han-lab/deepcompressorModel Compression Toolbox for Large Language Models and Diffusion Models
Python Apache License 2.0 UpdatedNov 10, 2024 -
duo-attention Public
Forked from mit-han-lab/duo-attentionDuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Python MIT License UpdatedOct 30, 2024 -
LVEval Public
Forked from infinigence/LVEvalRepository of LV-Eval Benchmark
Python MIT License UpdatedAug 31, 2024 -
-
6.5930_final_project Public
Final project for MIT 6.5930 Course
-
-
LLaVA Public
Forked from haotian-liu/LLaVA[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Python Apache License 2.0 UpdatedFeb 21, 2024 -
my-vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedDec 7, 2023 -
torchsparse Public
Forked from mit-han-lab/torchsparse[MLSys'22] TorchSparse: Efficient Point Cloud Inference Engine
Cuda MIT License UpdatedNov 7, 2023 -
-
MyFastChat Public
Forked from lm-sys/FastChatAn open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Python Apache License 2.0 UpdatedJul 28, 2023 -
-
-
-
dgSPARSE-Library Public
Forked from dgSPARSE/dgSPARSE-LibCuda Apache License 2.0 UpdatedAug 14, 2022 -
-
-
-
dgSPARSE-Wrapper Public
Forked from dgSPARSE/dgSPARSE-WrapperC Apache License 2.0 UpdatedMay 24, 2022 -
-
-
-