【PPDiffusers】增加FD ControlNet text2img, img2img, inpaint_legacy, hires…

…fix pipeline (PaddlePaddle#6234) * 增加fd contorlnet img2img text2img inpainting pipeline, hiresfix fd pipleine * update * update controlnet * update readme * update rm * update fd * update gpu * add lpw fix preconfig bug in img2img etc * 默认使用LPW * fix inpainting * 使用老版本的_load_pretrained_model_old
1xiaozhao2 · Jul 4, 2023 · aa4a62c · aa4a62c
1 parent b69063d
commit aa4a62c
Show file tree

Hide file tree

Showing 29 changed files with 6,982 additions and 921 deletions.
diff --git a/ppdiffusers/deploy/README.md b/ppdiffusers/deploy/README.md
@@ -39,32 +39,48 @@ pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdep
 
 下面将指定模型目录，推理引擎后端，硬件以及 scheduler 类型，运行 `infer.py` 脚本，完成文图生成任务。
 
-```
+```sh
 python infer.py --model_dir stable-diffusion-v1-4/ --scheduler "pndm" --backend paddle --task_name text2img
 ```
 
-脚本的输入提示语句为 **"a photo of an astronaut riding a horse on mars"**， 得到的图像文件为 fd_astronaut_rides_horse.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
+脚本的输入提示语句为 **"a photo of an astronaut riding a horse on mars"**， 得到的图像文件为 text2img.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
 
-![fd_astronaut_rides_horse.png](https://user-images.githubusercontent.com/10826371/200261112-68e53389-e0a0-42d1-8c3a-f35faa6627d7.png)
+![text2img.png](https://user-images.githubusercontent.com/10826371/200261112-68e53389-e0a0-42d1-8c3a-f35faa6627d7.png)
 
 如果使用 stable-diffusion-v1-5 模型，则可执行以下命令完成推理：
 
+```sh
+python infer.py --model_dir stable-diffusion-v1-5/ --scheduler "preconfig-euler-ancestral" --backend paddle_tensorrt --use_fp16 True --device gpu --task_name text2img
 ```
-python infer.py --model_dir stable-diffusion-v1-5/ --scheduler "euler_ancestral" --backend paddle_tensorrt --use_fp16 True --device gpu --task_name text2img
-```
 
+同时，我们还提供基于两阶段 HiresFix 的文图生成示例。下面将指定模型目录，指定任务名称为 `hiresfix` 后，运行 `infer.py` 脚本，完成`两阶段hiresfix任务`，在第一阶段我们生成了 `512x512分辨率` 的图片，然后在第二阶段我们在一阶段的基础上修复生成了 `768x768分辨率` 图片。
+
+|       without hiresfix       |       with hiresfix       |
+|:-------------------:|:-------------------:|
+|![][without-hiresfix]|![][with-hiresfix]|
+
+[without-hiresfix]: https://github.com/PaddlePaddle/PaddleNLP/assets/50394665/38ab6032-b960-4b76-8d69-0e0f8b5e1f42
+[with-hiresfix]: https://github.com/PaddlePaddle/PaddleNLP/assets/50394665/a472cb31-d8a2-451d-bf80-cd84c9ef0d08
+
+在80G A100上，ppdiffusers==0.16.1、fastdeploy==1.0.7、develop paddle、cuda11.7 的环境下，我们测出了如下的速度。
+- without hiresfix 的速度为：Mean latency: 1.930896 s, p50 latency: 1.932413 s, p90 latency: 1.933565 s, p95 latency: 1.933630 s.
+- with hiresfix 的速度为：Mean latency: 1.442178 s, p50 latency: 1.442885 s, p90 latency: 1.446133 s, p95 latency: 1.446285 s.
+
+```sh
+python infer.py --model_dir stable-diffusion-v1-5/ --scheduler "euler-ancestral" --backend paddle_tensorrt --use_fp16 True --device gpu --task_name hiresfix
+```
 
 <a name="文本引导的图像变换"></a>
 
 ### 文本引导的图像变换（Image-to-Image Text-Guided Generation）
 
 下面将指定模型目录，推理引擎后端，硬件以及 scheduler 类型，运行 `infer.py` 脚本，完成文本引导的图像变换任务。
 
-```
+```sh
 python infer.py --model_dir stable-diffusion-v1-4/ --scheduler "pndm" --backend paddle_tensorrt --use_fp16 True --device gpu --task_name img2img
 ```
 
-脚本输入的提示语句为 **"A fantasy landscape, trending on artstation"**，运行得到的图像文件为 fantasy_landscape.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
+脚本输入的提示语句为 **"A fantasy landscape, trending on artstation"**，运行得到的图像文件为 img2img.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
 
 |       input image       |       output image       |
 |:-------------------:|:-------------------:|
@@ -77,18 +93,18 @@ python infer.py --model_dir stable-diffusion-v1-4/ --scheduler "pndm" --backend
 
 如果使用 stable-diffusion-v1-5 模型，则可执行以下命令完成推理：
 
-```
-python infer.py --model_dir stable-diffusion-v1-5/ --scheduler "euler_ancestral" --backend paddle_tensorrt --use_fp16 True --device gpu --task_name img2img
+```sh
+python infer.py --model_dir stable-diffusion-v1-5/ --scheduler "euler-ancestral" --backend paddle_tensorrt --use_fp16 True --device gpu --task_name img2img
 ```
 
 
 同时，我们还提供基于 CycleDiffusion 的文本引导的图像变换示例。下面将指定模型目录，运行 `infer.py` 脚本，完成文本引导的图像变换任务。
 
-```
+```sh
 python infer.py --model_dir stable-diffusion-v1-4/ --backend paddle_tensorrt --use_fp16 True --device gpu --task_name cycle_diffusion
 ```
 
-脚本输入的源提示语句为 **"An astronaut riding a horse"**，目标提示语句为 **"An astronaut riding an elephant"**，运行得到的图像文件为 horse_to_elephant.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
+脚本输入的源提示语句为 **"An astronaut riding a horse"**，目标提示语句为 **"An astronaut riding an elephant"**，运行得到的图像文件为 cycle_diffusion.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
 
 |       input image       |       output image       |
 |:-------------------:|:-------------------:|
@@ -107,11 +123,11 @@ python infer.py --model_dir stable-diffusion-v1-4/ --backend paddle_tensorrt --u
 
 下面将指定模型目录，推理引擎后端，硬件以及 scheduler 类型，运行 `infer.py` 脚本，完成文本引导的图像编辑任务。
 
-```
-python infer.py --model_dir stable-diffusion-v1-4/ --scheduler euler_ancestral --backend paddle_tensorrt --use_fp16 True --device gpu --task_name inpaint_legacy
+```sh
+python infer.py --model_dir stable-diffusion-v1-4/ --scheduler euler-ancestral --backend paddle_tensorrt --use_fp16 True --device gpu --task_name inpaint_legacy
 ```
 
-脚本输入的提示语为 **"Face of a yellow cat, high resolution, sitting on a park bench"**，运行得到的图像文件为 cat_on_bench_new.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
+脚本输入的提示语为 **"Face of a yellow cat, high resolution, sitting on a park bench"**，运行得到的图像文件为 inpaint_legacy.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
 
 |       input image       |       mask image       |       output image
 |:-------------------:|:-------------------:|:-------------------:|
@@ -123,19 +139,19 @@ python infer.py --model_dir stable-diffusion-v1-4/ --scheduler euler_ancestral -
 
 如果使用 stable-diffusion-v1-5 模型，则可执行以下命令完成推理：
 
-```
-python infer.py --model_dir stable-diffusion-v1-5/ --scheduler euler_ancestral --backend paddle_tensorrt --use_fp16 True --device gpu --task_name inpaint_legacy
+```sh
+python infer.py --model_dir stable-diffusion-v1-5/ --scheduler euler-ancestral --backend paddle_tensorrt --use_fp16 True --device gpu --task_name inpaint_legacy
 ```
 
 #### 正式版本
 
 下面将指定模型目录，推理引擎后端，硬件以及 scheduler 类型，运行 `infer.py` 脚本，完成文本引导的图像编辑任务。
 
-```
-python infer.py --model_dir stable-diffusion-v1-5-inpainting/ --scheduler euler_ancestral --backend paddle_tensorrt --use_fp16 True --device gpu --task_name inpaint
+```sh
+python infer.py --model_dir stable-diffusion-v1-5-inpainting/ --scheduler euler-ancestral --backend paddle_tensorrt --use_fp16 True --device gpu --task_name inpaint
 ```
 
-脚本输入的提示语为 **"Face of a yellow cat, high resolution, sitting on a park bench"**，运行得到的图像文件为 cat_on_bench_new.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
+脚本输入的提示语为 **"Face of a yellow cat, high resolution, sitting on a park bench"**，运行得到的图像文件为 inpaint.png。生成的图片示例如下（每次生成的图片都不相同，示例仅作参考）：
 
 |       input image       |       mask image       |       output image
 |:-------------------:|:-------------------:|:-------------------:|
@@ -153,13 +169,18 @@ python infer.py --model_dir stable-diffusion-v1-5-inpainting/ --scheduler euler_
 | 参数 |参数说明 |
 |----------|--------------|
 | --model_dir | 导出后模型的目录。默认为 `runwayml/stable-diffusion-v1-5@fastdeploy` |
-| --backend | 推理引擎后端。默认为 `paddle_tensorrt`，可选列表：`['onnx_runtime', 'paddle', 'paddlelite', 'paddle_tensorrt']`。 |
-| --device | 运行设备。默认为 `cpu`，可选列表：`['cpu', 'gpu', 'huawei_ascend_npu', 'kunlunxin_xpu']`。 |
+| --backend | 推理引擎后端。默认为 `paddle_tensorrt`，可选列表：`['onnx_runtime', 'paddle', 'paddlelite', 'paddle_tensorrt', 'tensorrt']`。 |
+| --device | 运行设备。默认为 `gpu`，可选列表：`['cpu', 'gpu', 'huawei_ascend_npu', 'kunlunxin_xpu']`。 |
 | --device_id | `gpu` 设备的 id。若 `device_id` 为`-1`，视为使用 `cpu` 推理。 |
 | --inference_steps | `UNet` 模型运行的次数，默认为 `50`。 |
 | --benchmark_steps | `Benchmark` 运行的次数，默认为 `1`。 |
-| --use_fp16 | 是否使用 `fp16` 精度。默认为 `False`。使用 `paddle_tensorrt` 后端时可以设为 `True` 开启。 |
-| --task_name | 任务类型，默认为`text2img`，可选列表：`['text2img', 'img2img', 'inpaint', 'inpaint_legacy', 'cycle_diffusion', 'all']`。 注意，当`task_name`为`inpaint`时候，我们需要配合`runwayml/stable-diffusion-inpainting@fastdeploy`权重才能正常使用。|
-| --scheduler | 采样器类型。默认为 `'pndm'`。可选列表：`['pndm', 'lms', 'preconfig-lms', 'euler', 'euler-ancestral', 'preconfig-euler-ancestral', 'dpm-multi', 'dpm-single', 'unipc-multi', 'ddim', 'ddpm', 'deis-multi', 'heun', 'kdpm2-ancestral', 'kdpm2']`。|
-| --width | 生成图片的宽度，取值范围 512~768。|
-| --height | 生成图片的高度，取值范围 512~768。|
+| --use_fp16 | 是否使用 `fp16` 精度。默认为 `False`。使用 `paddle_tensorrt` 后端及 `kunlunxin_xpu` 设备时可以设为 `True` 开启。 |
+| --task_name | 任务类型，默认为`text2img`，可选列表：`['text2img', 'img2img', 'inpaint', 'inpaint_legacy', 'cycle_diffusion', 'hiresfix', 'all']`。 注意，当`task_name`为`inpaint`时候，我们需要配合`runwayml/stable-diffusion-inpainting@fastdeploy`权重才能正常使用。|
+| --scheduler | 采样器类型。默认为 `'preconfig-euler-ancestral'`。可选列表：`['pndm', 'lms', 'euler', 'euler-ancestral', 'preconfig-euler-ancestral', 'dpm-multi', 'dpm-single', 'unipc-multi', 'ddim', 'ddpm', 'deis-multi', 'heun', 'kdpm2-ancestral', 'kdpm2']`。|
+| --infer_op | 推理所采用的op，可选列表 `['zero_copy_infer', 'raw', 'all']`，`zero_copy_infer`推理速度更快，默认值为`zero_copy_infer`。 |
+| --parse_prompt_type | 处理prompt文本所使用的方法，可选列表 `['raw', 'lpw']`，`lpw`可强调句子中的单词，并且支持更长的文本输入，默认值为`lpw`。 |
+| --width | 生成图片的宽度，取值范围 512~768。默认值为 512。|
+| --height | 生成图片的高度，取值范围 512~768。默认值为 512。|
+| --hr_resize_width | hiresfix 所要生成的宽度，取值范围 512~768。默认值为 768。|
+| --hr_resize_height | hiresfix 所要生成的高度，取值范围 512~768。默认值为 768。|
+| --is_sd2_0 | 是否为sd2.0的模型？默认为 False 。|