[BUG] When I used the ImageNet Training Script to train my model, an unknown error occurred. #2367
Unanswered
stone-cloud
asked this question in
Q&A
Replies: 1 comment 1 reply
-
@stone-cloud the model is probably returning a tuple/list instead of just a single prediction tensor, you need to modify the train script to work with models that return list, tuples, dicts etc |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
When I used the ImageNet Training Script to train my model, the following error occurred. However, when I trained other models (segmentation models), my model file worked fine. I have been troubled by this issue for a long time and haven't found a detailed solution.
To Reproduce
Steps to reproduce the behavior:
My process:
Expected behavior
This is strange, I couldn't find a solution to the same problem. I suspect that the distributed training has sliced the data, but I don't understand why the output results haven't been merged.
Screenshots
Desktop (please complete the following information):
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
Beta Was this translation helpful? Give feedback.
All reactions