
🚢 Deploy
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
A Gradio web UI for Large Language Models with support for multiple inference backends.
Official community-driven Azure Machine Learning examples, tested with GitHub Actions.
A high-throughput and memory-efficient inference and serving engine for LLMs
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Production-Grade Container Scheduling and Management
A process for automating Docker container base image updates.
OpenTelemetry Python API and SDK
The open source Firebase alternative. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.