Making large AI models cheaper, faster and more accessible
-
Updated
Sep 30, 2024 - Python
Making large AI models cheaper, faster and more accessible
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Port of OpenAI's Whisper model in C/C++
A high-throughput and memory-efficient inference and serving engine for LLMs
Cross-platform, customizable ML solutions for live and streaming media.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.
Faster Whisper transcription with CTranslate2
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Large Language Model Text Generation Inference
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Runtime type system for IO decoding/encoding
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.
Add a description, image, and links to the inference topic page so that developers can more easily learn about it.
To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."