Build status for all platforms: Commercial support:
By downloading these archives, you agree to the terms of the license agreements for NVIDIA software included in the archives.
To view the license for Triton Inference Server included in these archives, click here
- Triton Inference Server is widely used software package for inference service
- Triton supports almost all kinds of model generated by different DL frameworks or tools, such as TensorFlow, PyTorch, ONNX Runtime, TensorRT, OpenVINO...
- Triton supports both CPU and GPU
- Triton can be used both as an application and as a shared library. In case you already have your own inference service framework but want to add more features, just try Triton as a shared library.
- Triton supports Java as a shared library through JavaCPP Presets
This directory contains the JavaCPP Presets module for:
- Triton Inference Server 2.26.0 https://github.com/triton-inference-server/server
Please refer to the parent README.md file for more detailed information about the JavaCPP Presets.
Java API documentation is available here:
Here is a simple example of Triton Inference Server ported to Java from the simple.cc
sample file available at:
We can use Maven 3 to download and install automatically all the class files as well as the native binaries. To run this sample code, after creating the pom.xml
and Simple.java
source files from the samples/simple
subdirectory, simply execute on the command line:
$ mvn compile exec:java -Dexec.args="-r /path/to/models"
This sample intends to show how to call the Java-mapped C API of Triton to execute inference requests.
- Get the source code of Triton Inference Server to prepare the model repository:
$ wget https://github.com/triton-inference-server/server/archive/refs/tags/v2.26.0.tar.gz
$ tar zxvf v2.26.0.tar.gz
$ cd server-2.26.0/docs/examples/model_repository
$ mkdir models
$ cd models; cp -a ../simple .
Now, this models
directory will be our model repository.
- Start the Docker container to run the sample (assuming we are under the
models
directory created above):
$ docker run -it --gpus=all -v $(pwd):/workspace nvcr.io/nvidia/tritonserver:22.09-py3 bash
$ apt update
$ apt install -y openjdk-11-jdk
$ wget https://archive.apache.org/dist/maven/maven-3/3.8.4/binaries/apache-maven-3.8.4-bin.tar.gz
$ tar zxvf apache-maven-3.8.4-bin.tar.gz
$ export PATH=/opt/tritonserver/apache-maven-3.8.4/bin:$PATH
$ git clone https://github.com/bytedeco/javacpp-presets.git
$ cd javacpp-presets
- Compile the
tritonserver
andtritonserver/platform
modules with Maven, which will generate the necessary bindings:
$ mvn clean install --projects .,tritonserver
$ mvn clean install -f platform --projects ../tritonserver/platform -Djavacpp.platform=linux-x86_64
- Execute
Simple.java
:
$ cd tritonserver/samples/simple
$ mvn compile exec:java -Dexec.mainClass=Simple -Djavacpp.platform=linux-x86_64 -Dexec.args="-r /workspace/models"
This sample is the Java implementation of the simple example written for the C API.
To run your code, you will need to:
- Create
pom.xml
and<your code>.java
source files, and - Similar to the
pom.xml
forSimple.java
, execute with:
$ mvn compile exec:java
Steps to run your *.java files with Triton Inference Server using the "uber JAR" inside an NGC container
After generating tritonserver/platform/target/tritonserver-platform-*-shaded.jar
by following steps 1 to 3 above, you can then execute the following to run directly your application:
$ cd tritonserver/samples/simple
$ java -cp ../platform/target/tritonserver-platform-*-shaded.jar Simple.java -r /workspace/models