Projects
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Open deep learning compiler stack for cpu, gpu and specialized accelerators
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
A flexible framework of neural networks for deep learning
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Development repository for the Triton language and compiler
ncnn is a high-performance neural network inference framework optimized for the mobile platform
A machine learning compiler for GPUs, CPUs, and ML accelerators
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
SGLang is a fast serving framework for large language models and vision language models.
Hackable and optimized Transformers building blocks, supporting a composable construction.
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
A high-throughput and memory-efficient inference and serving engine for LLMs
Fast and memory-efficient exact attention