Stars
brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…
A highly optimized LLM inference acceleration engine for Llama and its variants.
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
A framework for few-shot evaluation of language models.
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Setup and run a local LLM and Chatbot using consumer grade hardware.
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
SGLang is a fast serving framework for large language models and vision language models.
Production-Grade Container Scheduling and Management
这是一个用于显示当前网速、CPU及内存利用率的桌面悬浮窗软件,并支持任务栏显示,支持更换皮肤。
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Define and run multi-container applications with Docker
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Seamless operability between C++11 and Python
Fast and memory-efficient exact attention
The fundamental package for scientific computing with Python.