Stars
SGLang is a fast serving framework for large language models and vision language models.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deploym…
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
[IEEE T-PAMI 2023] Awesome BEV perception research and cookbook for all level audience in autonomous diriving
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
✨✨Latest Advances on Multimodal Large Language Models
Open-Sora: Democratizing Efficient Video Production for All
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
A pytorch quantization backend for optimum
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.