Stars
An app that brings language models directly to your phone.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
A high-throughput and memory-efficient inference and serving engine for LLMs
Port of OpenAI's Whisper model in C/C++
AlpinDale / sparsegpt-for-LLaMA
Forked from IST-DASLab/sparsegptCode for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
NASLib is a Neural Architecture Search (NAS) library for facilitating NAS research for the community by providing interfaces to several state-of-the-art NAS search spaces and optimizers.
Automatic architecture search and hyperparameter optimization for PyTorch
⚡ A Fast, Extensible Progress Bar for Python and CLI