You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using NVBit 1.5.3 to get the trace of TensorFlow application. When I run the application without NVBit, the TensorFlow application runs fine. However, when I run the application with NVBIt, the application is stuck in autotuning.
The running command is as the following: LD_PRELOAD=$TRACER_TOOL python run.py --batch_size 1024 --num_examples 2000
The error message ends as the following:
2022-09-14 03:50:07.595278: I tensorflow/core/common_runtime/executor.cc:813] Synchronous kernel done: 814 step -8366710006080086661 {{node decoder/basic_decoder/decoder/while/body/_10/decoder/basic_decoder/decoder/while/cond/output/_805}} = Merge[N=2, T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Func/decoder/basic_decoder/decoder/while/body/_10/decoder/basic_decoder/decoder/while/cond/then/_797/input/_855, decoder/basic_decoder/decoder/while/body/_10/decoder/basic_decoder/decoder/while/cond/else/_798/decoder/basic_decoder/decoder/while/cond/TensorArrayV2Read/TensorListGetItem) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-09-14 03:50:12.991962: I tensorflow/core/framework/model.cc:1513] Starting optimization of tunable parameters with HillClimb
2022-09-14 03:50:12.992094: I tensorflow/core/framework/model.cc:1563] Number of tunable parameters: 0
2022-09-14 03:50:12.992137: I tensorflow/core/kernels/data/model_dataset_op.cc:200] Waiting for 20480 ms.
2022-09-14 03:50:33.472384: I tensorflow/core/framework/model.cc:1513] Starting optimization of tunable parameters with HillClimb
2022-09-14 03:50:33.472532: I tensorflow/core/framework/model.cc:1563] Number of tunable parameters: 0
2022-09-14 03:50:33.472568: I tensorflow/core/kernels/data/model_dataset_op.cc:200] Waiting for 40960 ms.
2022-09-14 03:51:14.432780: I tensorflow/core/framework/model.cc:1513] Starting optimization of tunable parameters with HillClimb
2022-09-14 03:51:14.432929: I tensorflow/core/framework/model.cc:1563] Number of tunable parameters: 0
2022-09-14 03:51:14.432963: I tensorflow/core/kernels/data/model_dataset_op.cc:200] Waiting for 60000 ms.
...
I am using NVBit 1.5.3 to get the trace of TensorFlow application. When I run the application without NVBit, the TensorFlow application runs fine. However, when I run the application with NVBIt, the application is stuck in autotuning.
The running command is as the following:
LD_PRELOAD=$TRACER_TOOL python run.py --batch_size 1024 --num_examples 2000
The error message ends as the following:
The application is stuck on parameter autotuning. Even if I disabled the parameter autotuning, the application is stuck and does not end.
I am using cuda 10.1 with cudnn 7.6.4. I am using RTX 2080Ti GPU with Intel Xeon Gold CPU.
I am sorry for submitting issue specified to TF. Thank you.
The text was updated successfully, but these errors were encountered: