This demo application ("demoDiffusion") showcases the acceleration of Stable Diffusion pipeline using TensorRT.
git clone [email protected]:NVIDIA/TensorRT.git -b release/8.6 --single-branch
cd TensorRT
Install nvidia-docker using these intructions.
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.02-py3 /bin/bash
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade tensorrt
Minimum required version is TensorRT 8.6.0. Check your installed version using:
python3 -c 'import tensorrt;print(tensorrt.__version__)'
NOTE: Alternatively, you can download and install TensorRT packages from NVIDIA TensorRT Developer Zone.
export TRT_OSSPATH=/workspace
cd $TRT_OSSPATH/demo/Diffusion
pip3 install -r requirements.txt
# Create output directories
mkdir -p onnx engine output
NOTE: demoDiffusion has been tested on systems with NVIDIA A100, RTX3090, and RTX4090 GPUs, and the following software configuration.
diffusers 0.14.0
onnx 1.13.1
onnx-graphsurgeon 0.3.26
onnxruntime 1.14.1
polygraphy 0.47.1
tensorrt 8.6.1.6
tokenizers 0.13.2
torch 1.13.0
transformers 4.26.1
NOTE: optionally install HuggingFace accelerate package for faster and less memory-intense model loading.
python3 demo_txt2img.py --help
python3 demo_img2img.py --help
python3 demo_inpaint.py --help
To download the model checkpoints for the Stable Diffusion pipeline, you will need a read
access token. See instructions.
export HF_TOKEN=<your access token>
python3 demo_txt2img.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN -v
python3 demo_img2img.py "photorealistic new zealand hills" --hf-token=$HF_TOKEN -v
Use --input-image=<path to image>
to specify your image. Otherwise the example image will be downloaded from the Internet.
# Create separate onnx/engine directories when switching versions
mkdir -p onnx-1.5 engine-1.5
python3 demo_inpaint.py "a mecha robot sitting on a bench" --hf-token=$HF_TOKEN --version=1.5 --onnx-dir=onnx-1.5 --engine-dir=engine-1.5 -v
Use --input-image=<path to image>
and --mask-image=<path to mask>
to specify your inputs. They must have the same dimensions. Otherwise the example image and mask will be downloaded from the Internet.
- One can set schdeuler using
--scheduler=EulerA
. Note that some schedulers are not available for some pipelines or version. - To accelerate engine building time one can use
--timing-cache=<path to cache file>
. This cache file will be created if does not exist. Note, that it may influence the performance if the cache file created on the other hardware is used. It is suggested to use this flag only during development. To achieve the best perfromance during deployment, please, build engines without timing cache. - To switch between versions or pipelines one needs either to clear onnx and engine dirs, or to specify
--force-onnx-export --force-onnx-optimize --force-engine-build
or to create new dirs and to specify--onnx-dir=<new onnx dir> --engine-dir=<new engine dir>
. - Inference performance can be improved by enabling CUDA graphs using
--use-cuda-graph
. Enabling CUDA graphs requires fixed input shapes, so this flag must be combined with--build-static-batch
and cannot be combined with--build-dynamic-shape
.