Skip to content

Commit

Permalink
ultralytics 8.0.195 NVIDIA Triton Inference Server support (ultraly…
Browse files Browse the repository at this point in the history
…tics#5257)

Co-authored-by: TheConstant3 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
3 people authored Oct 7, 2023
1 parent 40e3923 commit c7aa83d
Show file tree
Hide file tree
Showing 21 changed files with 351 additions and 100 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ jobs:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

Conda:
if: github.repository == 'ultralytics/ultralytics' && (github.event_name == 'schedule-disabled' || github.event.inputs.conda == 'true')
if: github.repository == 'ultralytics/ultralytics' && (github.event_name == 'schedule' || github.event.inputs.conda == 'true')
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Image is CUDA-optimized for YOLOv8 single/multi-GPU training and inference

# Start FROM PyTorch image https://hub.docker.com/r/pytorch/pytorch or nvcr.io/nvidia/pytorch:23.03-py3
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
RUN pip install --no-cache nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com

# Downloads to user config dir
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/azureml-quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Train a detection model for 10 epochs with an initial learning_rate of 0.01:
yolo train data=coco128.yaml model=yolov8n.pt epochs=10 lr0=0.01
```

You can find more [instructions to use the Ultralytics cli here](https://docs.ultralytics.com/quickstart/#use-ultralytics-with-cli).
You can find more [instructions to use the Ultralytics CLI here](https://docs.ultralytics.com/quickstart/#use-ultralytics-with-cli).

## Quickstart from a Notebook

Expand Down
1 change: 1 addition & 0 deletions docs/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Here's a compilation of in-depth guides to help you master different aspects of
* [Conda Quickstart](conda-quickstart.md) 🚀 NEW: Step-by-step guide to setting up a [Conda](https://anaconda.org/conda-forge/ultralytics) environment for Ultralytics. Learn how to install and start using the Ultralytics package efficiently with Conda.
* [Docker Quickstart](docker-quickstart.md) 🚀 NEW: Complete guide to setting up and using Ultralytics YOLO models with [Docker](https://hub.docker.com/r/ultralytics/ultralytics). Learn how to install Docker, manage GPU support, and run YOLO models in isolated containers for consistent development and deployment.
* [Raspberry Pi](raspberry-pi.md) 🚀 NEW: Quickstart tutorial to run YOLO models to the latest Raspberry Pi hardware.
* [Triton Inference Server Integration](triton-inference-server.md) 🚀 NEW: Dive into the integration of Ultralytics YOLOv8 with NVIDIA's Triton Inference Server for scalable and efficient deep learning inference deployments.

## Contribute to Our Guides

Expand Down
76 changes: 16 additions & 60 deletions docs/guides/raspberry-pi.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,47 +37,25 @@ You should see a video feed from your camera.

This guide offers you the flexibility to start with either [YOLOv5](https://github.com/ultralytics/yolov5) or [YOLOv8](https://github.com/ultralytics/ultralytics). Both versions have their unique advantages and use-cases. The choice is yours, but remember, the guide's aim is not just quick setup but also a robust foundation for your future work in object detection.

## Hardware Specifics: Raspberry Pi 3 vs Raspberry Pi 4

Raspberry Pi 3 and Raspberry Pi 4 have distinct hardware specifications, and the YOLO installation and configuration process can vary slightly depending on which model you're using.

### Raspberry Pi 3

- **CPU**: 1.2GHz Quad-Core ARM Cortex-A53
- **RAM**: 1GB LPDDR2
- **USB Ports**: 4 x USB 2.0
- **Network**: Ethernet & Wi-Fi 802.11n
- **Performance**: Generally slower, may require lighter YOLO models for real-time processing
- **Power Requirement**: 2.5A power supply
- **Official Documentation**: [Raspberry Pi 3 Documentation](https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2837/README.md)

### Raspberry Pi 4

- **CPU**: 1.5GHz Quad-core 64-bit ARM Cortex-A72 CPU
- **RAM**: Options of 2GB, 4GB or 8GB LPDDR4
- **USB Ports**: 2 x USB 2.0, 2 x USB 3.0
- **Network**: Gigabit Ethernet & Wi-Fi 802.11ac
- **Performance**: Faster, capable of running more complex YOLO models in real-time
- **Power Requirement**: 3.0A USB-C power supply
- **Official Documentation**: [Raspberry Pi 4 Documentation](https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/README.md)

### Raspberry Pi 5

- **CPU**: 2.4GHz Quad-core 64-bit Arm Cortex-A76 CPU
- **GPU**: VideoCore VII, supporting OpenGL ES 3.1, Vulkan 1.2
- **Display Output**: Dual 4Kp60 HDMI
- **Decoder**: 4Kp60 HEVC
- **Network**: Gigabit Ethernet with PoE+ support, Dual-band 802.11ac Wi-Fi®, Bluetooth 5.0 / BLE
- **USB Ports**: 2 x USB 3.0, 2 x USB 2.0
- **Other Features**: High-speed microSD card interface with SDR104 mode, 2 × 4-lane MIPI camera/display transceivers, PCIe 2.0 x1 interface, standard 40-pin GPIO header, real-time clock, power button
- **Power Requirement**: Specifics not yet available, expected to require a higher amperage supply
- **Official Documentation**: [Raspberry Pi 5 Documentation](https://www.raspberrypi.com/news/introducing-raspberry-pi-5/)
## Hardware Specifics: At a Glance

To assist you in making an informed hardware decision, we've summarized the key hardware specifics of Raspberry Pi 3, 4, and 5 in the table below:

| Feature | Raspberry Pi 3 | Raspberry Pi 4 | Raspberry Pi 5 |
|----------------------------|------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| **CPU** | 1.2GHz Quad-Core ARM Cortex-A53 | 1.5GHz Quad-core 64-bit ARM Cortex-A72 | 2.4GHz Quad-core 64-bit Arm Cortex-A76 |
| **RAM** | 1GB LPDDR2 | 2GB, 4GB or 8GB LPDDR4 | *Details not yet available* |
| **USB Ports** | 4 x USB 2.0 | 2 x USB 2.0, 2 x USB 3.0 | 2 x USB 3.0, 2 x USB 2.0 |
| **Network** | Ethernet & Wi-Fi 802.11n | Gigabit Ethernet & Wi-Fi 802.11ac | Gigabit Ethernet with PoE+ support, Dual-band 802.11ac Wi-Fi® |
| **Performance** | Slower, may require lighter YOLO models | Faster, can run complex YOLO models | *Details not yet available* |
| **Power Requirement** | 2.5A power supply | 3.0A USB-C power supply | *Details not yet available* |
| **Official Documentation** | [Link](https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2837/README.md) | [Link](https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/README.md) | [Link](https://www.raspberrypi.com/news/introducing-raspberry-pi-5/) |

Please make sure to follow the instructions specific to your Raspberry Pi model to ensure a smooth setup process.

## Quick Start with YOLOv5

This section outlines how to set up YOLOv5 on a Raspberry Pi 3 or 4 with a Pi Camera. These steps are designed to be compatible with the libcamera camera stack introduced in Raspberry Pi OS Bullseye.
This section outlines how to set up YOLOv5 on a Raspberry Pi with a Pi Camera. These steps are designed to be compatible with the libcamera camera stack introduced in Raspberry Pi OS Bullseye.

### Install Necessary Packages

Expand Down Expand Up @@ -171,7 +149,7 @@ Follow this section if you are interested in setting up YOLOv8 instead. The step
sudo apt-get autoremove -y
```
2. Install YOLOv8:
2. Install the `ultralytics` Python package:
```bash
pip3 install ultralytics
Expand All @@ -183,28 +161,6 @@ Follow this section if you are interested in setting up YOLOv8 instead. The step
sudo reboot
```
### Modify `build.py`
Just like YOLOv5, YOLOv8 also needs minor modifications to accept TCP streams.
1. Open `build.py` located in the Ultralytics package folder:
```bash
sudo nano /home/pi/.local/lib/pythonX.X/site-packages/ultralytics/build.py
```
2. Find and modify the `is_url` line to accept TCP streams:
```python
is_url = source.lower().startswith(('rtsp://', 'rtmp://', 'http://', 'https://', 'tcp://'))
```
3. Save and exit:
```bash
CTRL + O -> ENTER -> CTRL + X
```
### Initiate TCP Stream with Libcamera
1. Start the TCP stream:
Expand All @@ -231,7 +187,7 @@ while True:
## Next Steps
Congratulations on successfully setting up YOLO on your Raspberry Pi! For further learning and support, visit [Ultralytics](https://ultralytics.com/) and [KashmirWorldFoundation](https://www.kashmirworldfoundation.org/).
Congratulations on successfully setting up YOLO on your Raspberry Pi! For further learning and support, visit [Ultralytics](https://ultralytics.com/) and [Kashmir World Foundation](https://www.kashmirworldfoundation.org/).
## Acknowledgements and Citations
Expand Down
137 changes: 137 additions & 0 deletions docs/guides/triton-inference-server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
comments: true
description: A step-by-step guide on integrating Ultralytics YOLOv8 with Triton Inference Server for scalable and high-performance deep learning inference deployments.
keywords: YOLOv8, Triton Inference Server, ONNX, Deep Learning Deployment, Scalable Inference, Ultralytics, NVIDIA, Object Detection, Cloud Inferencing
---

# Triton Inference Server with Ultralytics YOLOv8

The [Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) (formerly known as TensorRT Inference Server) is an open-source software solution developed by NVIDIA. It provides a cloud inferencing solution optimized for NVIDIA GPUs. Triton simplifies the deployment of AI models at scale in production. Integrating Ultralytics YOLOv8 with Triton Inference Server allows you to deploy scalable, high-performance deep learning inference workloads. This guide provides steps to set up and test the integration.

<p align="center">
<br>
<iframe width="720" height="405" src="https://www.youtube.com/embed/NQDtfSi5QF4"
title="Getting Started with NVIDIA Triton Inference Server" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> Getting Started with NVIDIA Triton Inference Server.
</p>

## What is Triton Inference Server?

Triton Inference Server is designed to deploy a variety of AI models in production. It supports a wide range of deep learning and machine learning frameworks, including TensorFlow, PyTorch, ONNX Runtime, and many others. Its primary use cases are:

- Serving multiple models from a single server instance.
- Dynamic model loading and unloading without server restart.
- Ensemble inferencing, allowing multiple models to be used together to achieve results.
- Model versioning for A/B testing and rolling updates.

## Prerequisites

Ensure you have the following prerequisites before proceeding:

- Docker installed on your machine.
- Install `tritonclient`:
```bash
pip install tritonclient[all]
```

## Exporting YOLOv8 to ONNX Format

Before deploying the model on Triton, it must be exported to the ONNX format. ONNX (Open Neural Network Exchange) is a format that allows models to be transferred between different deep learning frameworks. Use the `export` function from the `YOLO` class:

```python
from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt') # load an official model

# Export the model
onnx_file = model.export(format='onnx', dynamic=True)
```

## Setting Up Triton Model Repository

The Triton Model Repository is a storage location where Triton can access and load models.

1. Create the necessary directory structure:

```python
from pathlib import Path

# Define paths
triton_repo_path = Path('tmp') / 'triton_repo'
triton_model_path = triton_repo_path / 'yolo'

# Create directories
(triton_model_path / '1').mkdir(parents=True, exist_ok=True)
```

2. Move the exported ONNX model to the Triton repository:

```python
from pathlib import Path

# Move ONNX model to Triton Model path
Path(onnx_file).rename(triton_model_path / '1' / 'model.onnx')

# Create config file
(triton_model_path / 'config.pdtxt').touch()
```

## Running Triton Inference Server

Run the Triton Inference Server using Docker:

```python
import subprocess
import time

from tritonclient.http import InferenceServerClient

# Define image https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
tag = 'nvcr.io/nvidia/tritonserver:23.09-py3' # 6.4 GB

# Pull the image
subprocess.call(f'docker pull {tag}', shell=True)

# Run the Triton server and capture the container ID
container_id = subprocess.check_output(
f'docker run -d --rm -v {triton_repo_path}:/models -p 8000:8000 {tag} tritonserver --model-repository=/models',
shell=True).decode('utf-8').strip()

# Wait for the Triton server to start
triton_client = InferenceServerClient(url='localhost:8000', verbose=False, ssl=False)

# Wait until model is ready
for _ in range(10):
with contextlib.suppress(Exception):
assert triton_client.is_model_ready(model_name)
break
time.sleep(1)
```

Then run inference using the Triton Server model:

```python
from ultralytics import YOLO

# Load the Triton Server model
model = YOLO(f'http://localhost:8000/yolo', task='detect')

# Run inference on the server
results = model('path/to/image.jpg')
```

Cleanup the container:

```python
# Kill and remove the container at the end of the test
subprocess.call(f'docker kill {container_id}', shell=True)
```

---

By following the above steps, you can deploy and run Ultralytics YOLOv8 models efficiently on Triton Inference Server, providing a scalable and high-performance solution for deep learning inference tasks. If you face any issues or have further queries, refer to the [official Triton documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html) or reach out to the Ultralytics community for support.
2 changes: 1 addition & 1 deletion docs/modes/export.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Export a YOLOv8n model to a different format like ONNX or TensorRT. See Argument

# Load a model
model = YOLO('yolov8n.pt') # load an official model
model = YOLO('path/to/best.pt') # load a custom trained
model = YOLO('path/to/best.pt') # load a custom trained model

# Export the model
model.export(format='onnx')
Expand Down
9 changes: 9 additions & 0 deletions docs/reference/utils/triton.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Reference for `ultralytics/utils/triton.py`

!!! note

Full source code for this file is available at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/utils/triton.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/utils/triton.py). Help us fix any issues you see by submitting a [Pull Request](https://docs.ultralytics.com/help/contributing/) 🛠️. Thank you 🙏!

---
## ::: ultralytics.utils.triton.TritonRemoteModel
<br><br>
2 changes: 1 addition & 1 deletion docs/tasks/classify.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ Export a YOLOv8n-cls model to a different format like ONNX, CoreML, etc.

# Load a model
model = YOLO('yolov8n-cls.pt') # load an official model
model = YOLO('path/to/best.pt') # load a custom trained
model = YOLO('path/to/best.pt') # load a custom trained model

# Export the model
model.export(format='onnx')
Expand Down
2 changes: 1 addition & 1 deletion docs/tasks/detect.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ Export a YOLOv8n model to a different format like ONNX, CoreML, etc.

# Load a model
model = YOLO('yolov8n.pt') # load an official model
model = YOLO('path/to/best.pt') # load a custom trained
model = YOLO('path/to/best.pt') # load a custom trained model

# Export the model
model.export(format='onnx')
Expand Down
2 changes: 1 addition & 1 deletion docs/tasks/pose.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ Export a YOLOv8n Pose model to a different format like ONNX, CoreML, etc.

# Load a model
model = YOLO('yolov8n-pose.pt') # load an official model
model = YOLO('path/to/best.pt') # load a custom trained
model = YOLO('path/to/best.pt') # load a custom trained model

# Export the model
model.export(format='onnx')
Expand Down
2 changes: 1 addition & 1 deletion docs/tasks/segment.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ Export a YOLOv8n-seg model to a different format like ONNX, CoreML, etc.

# Load a model
model = YOLO('yolov8n-seg.pt') # load an official model
model = YOLO('path/to/best.pt') # load a custom trained
model = YOLO('path/to/best.pt') # load a custom trained model

# Export the model
model.export(format='onnx')
Expand Down
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ nav:
- Conda Quickstart: guides/conda-quickstart.md
- Docker Quickstart: guides/docker-quickstart.md
- Raspberry Pi: guides/raspberry-pi.md
- Triton Inference Server: guides/triton-inference-server.md
- Integrations:
- integrations/index.md
- OpenVINO: integrations/openvino.md
Expand Down Expand Up @@ -390,6 +391,7 @@ nav:
- plotting: reference/utils/plotting.md
- tal: reference/utils/tal.md
- torch_utils: reference/utils/torch_utils.md
- triton: reference/utils/triton.md
- tuner: reference/utils/tuner.md

- Help:
Expand Down
Loading

0 comments on commit c7aa83d

Please sign in to comment.