Much lower frame rate has been observed on the 1080ti #16

aiLover2 · 2022-05-27T19:47:53Z

Hello, I've tested the repo on the 1080ti and on the below configuration I see that the elapsed time is about 600ms, which is extremely high, and I'm thinking maybe I'm doing something wrong.

#define STR2(x) STR1(x)
// #define USE_FP16  // comment out this if want to use FP16
#define CONF_THRESH 0.5
#define BATCH_SIZE 1
#define BILINEAR true
// stuff we know about the network and the input/output blobs
static const int INPUT_H = 512;
static const int INPUT_W = 512;
static const int OUTPUT_SIZE = 512 * 512;

const char *INPUT_BLOB_NAME = "data";
const char *OUTPUT_BLOB_NAME = "prob";

I have downloaded the weights from this link and converted them. I checked the MD5sum output with you and it was fine.I have attached the inference log.

678ms
1
662ms
1
658ms
1
658ms
1
665ms
1
664ms
1
664ms
1
657ms
1
661ms
1
663ms
1
716ms
1
654ms
1
657ms
1
667ms
1

Process finished with exit code 0

Test in pytorch

The same configuration was tested in Pytorch and the logs are attached

input to net -> torch.Size([1, 3, 512, 512])
output from net -> {} torch.Size([1, 1, 512, 512])
Predicted in 40.77601432800293 milliseconds

input to net -> torch.Size([1, 3, 512, 512])
output from net -> {} torch.Size([1, 1, 512, 512])
Predicted in 39.133310317993164 milliseconds

input to net -> torch.Size([1, 3, 512, 512])
output from net -> {} torch.Size([1, 1, 512, 512])
Predicted in 38.28907012939453 milliseconds

Could you please let me know why it takes so long? Thank you in advance

Saeed

The text was updated successfully, but these errors were encountered:

aiLover2 · 2022-06-01T15:59:35Z

According to the below test, the speed was fine with the normal onnx to trt process:

Average on 10 runs - GPU latency: 7.42174 ms - Host latency: 7.56389 ms (end to end 13.8005 ms, enqueue 0.470346 ms)

I will be glad to assist you in finding the problem or help you update the codes as follows(these are simple for you but maybe you don't have the time for that)

Finding the sm automatically
Create a cmake module for TensorRT finding
Update and refine the code to the latest TensorRT version
...
Please let me know via [email protected] if that is fine with you.

YuzhouPeng · 2022-06-02T09:25:37Z

I tried to use 3090 to test image, and the average inference speed is 160ms per image(1918x1280, batch size1), and maybe some information in CMakeList (sm version )influence the inference performance. I suggest to comment some info in CMakeList.txt :

#option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
#set(CMAKE_CXX_STANDARD 11)
#set(CMAKE_BUILD_TYPE Debug)
#set(CUDA_NVCC_PLAGS ${CUDA_NVCC_PLAGS};-std=c++11;-g;-G;-gencode;arch=compute_30;code=sm_85)
#set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -Ofast -Wfatal-errors-D_MWAITXINTRIN_H_INCLUDED")
#add_definitions(-O2 -pthread)

aiLover2 · 2022-06-05T14:12:36Z

Please note that I have changed the sm version to the 1080ti corresponding as follows:

set(CUDA_NVCC_PLAGS ${CUDA_NVCC_PLAGS};-std=c++11;-gencode;arch=compute_61;code=sm_61)

It got worse after I commented on these lines. Time for inference has been changed from 600ms to 1400ms.

YuzhouPeng · 2022-06-06T07:34:17Z

I can not test 1080ti performance but I searched and find a similar issue: NVIDIA/TensorRT#1221

maybe use different cudnn can help

YSUN-coder · 2022-09-01T18:24:40Z

According to the below test, the speed was fine with the normal onnx to trt process:
Average on 10 runs - GPU latency: 7.42174 ms - Host latency: 7.56389 ms (end to end 13.8005 ms, enqueue 0.470346 ms)
I will be glad to assist you in finding the problem or help you update the codes as follows(these are simple for you but maybe you don't have the time for that)

Finding the sm automatically

Create a cmake module for TensorRT finding

Update and refine the code to the latest TensorRT version

...
Please let me know via [email protected] if that is fine with you.

Hi, could I know you how to get the test result here? I tried the ./unet -d ../samples in Jetson nano, and the result is similar to your device, 767ms per frame.

YuzhouPeng · 2024-07-24T03:47:42Z

https://github.com/wang-xinyu/tensorrtx/tree/master/unet please use newest repo for testing, this old repo is no longer maintained

aiLover2 changed the title ~~Different elapsed time has been seen~~ Different elapsed times have been observed May 27, 2022

aiLover2 changed the title ~~Different elapsed times have been observed~~ Much lower frame rate has been observed on the 1080ti May 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much lower frame rate has been observed on the 1080ti #16

Much lower frame rate has been observed on the 1080ti #16

aiLover2 commented May 27, 2022 •

edited

Loading

aiLover2 commented Jun 1, 2022

YuzhouPeng commented Jun 2, 2022 •

edited

Loading

aiLover2 commented Jun 5, 2022 •

edited

Loading

YuzhouPeng commented Jun 6, 2022 •

edited

Loading

YSUN-coder commented Sep 1, 2022

YuzhouPeng commented Jul 24, 2024

Much lower frame rate has been observed on the 1080ti #16

Much lower frame rate has been observed on the 1080ti #16

Comments

aiLover2 commented May 27, 2022 • edited Loading

Test in pytorch

aiLover2 commented Jun 1, 2022

YuzhouPeng commented Jun 2, 2022 • edited Loading

aiLover2 commented Jun 5, 2022 • edited Loading

YuzhouPeng commented Jun 6, 2022 • edited Loading

YSUN-coder commented Sep 1, 2022

YuzhouPeng commented Jul 24, 2024

aiLover2 commented May 27, 2022 •

edited

Loading

YuzhouPeng commented Jun 2, 2022 •

edited

Loading

aiLover2 commented Jun 5, 2022 •

edited

Loading

YuzhouPeng commented Jun 6, 2022 •

edited

Loading