Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training yolo.2.0.cfg returns NaN for count = 0, even though image is annotated #460

Open
clockworkkiwi opened this issue Feb 8, 2018 · 29 comments

Comments

@clockworkkiwi
Copy link

I am training yolo.2.0.cfg on a custom dataset and after some 100 Iterations I only get NaN like:
Region Avg IOU: nan, Class: nan, Obj: nan, No Obj: nan, Avg Recall: 0.000000, count: 42

I tried to reproduce the error on my CPU with batchsize 1 and only using 1 image. The image is annotated with 11 Objects, therefoe I thought that count should allways be 11. However it is sometimes 3,1, and 0 (see log below). When count is 0 I am getting NaN, probably because during calculation of IoU a division by 0 occurs.

My question is, is my concept of count wrong? And if not, why is it changing constantly?
The cfg and annotation file is provided below.

./darknet detector train Training/cars.data Training/yolo.2.0_cars.cfg Training/darknet19_448.conv.23
yolo
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32
1 max 2 x 2 / 2 608 x 608 x 32 -> 304 x 304 x 32
2 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64
3 max 2 x 2 / 2 304 x 304 x 64 -> 152 x 152 x 64
4 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
5 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64
6 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
7 max 2 x 2 / 2 152 x 152 x 128 -> 76 x 76 x 128
8 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
9 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128
10 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
11 max 2 x 2 / 2 76 x 76 x 256 -> 38 x 38 x 256
12 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
13 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
14 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
15 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
16 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
17 max 2 x 2 / 2 38 x 38 x 512 -> 19 x 19 x 512
18 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
19 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
20 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
21 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
22 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
23 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
24 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
25 route 16
26 reorg / 2 38 x 38 x 512 -> 19 x 19 x2048
27 route 26 24
28 conv 1024 3 x 3 / 1 19 x 19 x3072 -> 19 x 19 x1024
29 conv 30 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 30
30 detection
mask_scale: Using default '1.000000'
Loading weights from Training/darknet19_448.conv.23...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Loaded: 0.147676 seconds
Region Avg IOU: 0.237344, Class: 1.000000, Obj: 0.274930, No Obj: 0.443802, Avg Recall: 0.090909, count: 11
1: 576.071167, 576.071167 avg, 0.001000 rate, 71.343607 seconds, 1 images
Loaded: 0.000104 seconds
Region Avg IOU: 0.077130, Class: 1.000000, Obj: 0.292493, No Obj: 0.446449, Avg Recall: 0.000000, count: 11
2: 702.093445, 588.673401 avg, 0.001000 rate, 69.007693 seconds, 2 images
Loaded: 0.000073 seconds
Region Avg IOU: 0.130509, Class: 1.000000, Obj: 0.342454, No Obj: 0.444011, Avg Recall: 0.000000, count: 11
3: 576.471802, 587.453247 avg, 0.001000 rate, 69.223896 seconds, 3 images
Loaded: 0.000078 seconds
Region Avg IOU: 0.048404, Class: 1.000000, Obj: 0.240457, No Obj: 0.440917, Avg Recall: 0.000000, count: 3
4: 555.401550, 584.248047 avg, 0.001000 rate, 68.168291 seconds, 4 images
Loaded: 0.000082 seconds
Region Avg IOU: 0.062680, Class: 1.000000, Obj: 0.133529, No Obj: 0.450822, Avg Recall: 0.000000, count: 11
5: 647.937134, 590.616943 avg, 0.001000 rate, 67.656793 seconds, 5 images
Loaded: 0.000079 seconds
Region Avg IOU: 0.065679, Class: 1.000000, Obj: 0.326323, No Obj: 0.441488, Avg Recall: 0.000000, count: 3
6: 475.536743, 579.108948 avg, 0.001000 rate, 66.304383 seconds, 6 images
Loaded: 0.000087 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.444047, Avg Recall: -nan, count: 0
7: 438.538666, 565.051941 avg, 0.001000 rate, 67.867137 seconds, 7 images
Loaded: 0.000070 seconds
Region Avg IOU: 0.132911, Class: 1.000000, Obj: 0.190774, No Obj: 0.443059, Avg Recall: 0.000000, count: 11
8: 619.727173, 570.519470 avg, 0.001000 rate, 66.372526 seconds, 8 images
Loaded: 0.000081 seconds
Region Avg IOU: 0.252439, Class: 1.000000, Obj: 0.223981, No Obj: 0.443295, Avg Recall: 0.333333, count: 3
9: 460.461884, 559.513733 avg, 0.001000 rate, 68.189868 seconds, 9 images
Loaded: 0.000087 seconds
Region Avg IOU: 0.142704, Class: 1.000000, Obj: 0.221254, No Obj: 0.443132, Avg Recall: 0.000000, count: 11
10: 569.588257, 560.521179 avg, 0.001000 rate, 66.707857 seconds, 10 images
Loaded: 0.000085 seconds
Region Avg IOU: 0.024215, Class: 1.000000, Obj: 0.265335, No Obj: 0.443488, Avg Recall: 0.000000, count: 1
11: 446.488312, 549.117920 avg, 0.001000 rate, 65.911859 seconds, 11 images
Loaded: 0.000075 seconds
Region Avg IOU: 0.136938, Class: 1.000000, Obj: 0.298591, No Obj: 0.442529, Avg Recall: 0.000000, count: 11
12: 619.259888, 556.132141 avg, 0.001000 rate, 67.203997 seconds, 12 images
Loaded: 0.000087 seconds
Region Avg IOU: 0.128904, Class: 1.000000, Obj: 0.300296, No Obj: 0.449025, Avg Recall: 0.000000, count: 11
13: 537.905579, 554.309509 avg, 0.001000 rate, 75.953351 seconds, 13 images
Loaded: 0.000117 seconds
Region Avg IOU: 0.219828, Class: 1.000000, Obj: 0.144575, No Obj: 0.442554, Avg Recall: 0.181818, count: 11
14: 587.459167, 557.624451 avg, 0.001000 rate, 69.909060 seconds, 14 images
Loaded: 0.000088 seconds
Region Avg IOU: 0.118915, Class: 1.000000, Obj: 0.508260, No Obj: 0.442919, Avg Recall: 0.000000, count: 1
15: 449.169159, 546.778931 avg, 0.001000 rate, 66.654878 seconds, 15 images
Loaded: 0.000085 seconds
Region Avg IOU: 0.228912, Class: 1.000000, Obj: 0.125506, No Obj: 0.442766, Avg Recall: 0.000000, count: 1
16: 443.907257, 536.491760 avg, 0.001000 rate, 70.434208 seconds, 16 images
Loaded: 0.000077 seconds

1479502650254806942.txt

yolo.2.0_cars.cfg.txt

Help is highly appreciated!

@Arun-Trichy
Copy link

Got similar Error, only that I got the line with count as 0 repeatedly.

root@e9a01c03fd7a:/arun/darknet# ./darknet detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23
yolo-obj
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024
30 conv 35 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 35
31 detection
mask_scale: Using default '1.000000'
Loading weights from darknet19_448.conv.23...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
352
Loaded: 17.398113 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.461065, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462061, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.464741, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.464582, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462587, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.463322, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.464065, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.461787, Avg Recall: -nan, count: 0
1: 149.571701, 149.571701 avg, 0.000000 rate, 19.846876 seconds, 64 images
Loaded: 0.125245 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.465770, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.464636, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462528, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462455, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.464403, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.463434, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.464247, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.464753, Avg Recall: -nan, count: 0
2: 150.367554, 149.651291 avg, 0.000000 rate, 17.598082 seconds, 128 images
Loaded: 3.762255 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.461768, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462999, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462213, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462579, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.463145, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.461601, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.461762, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462720, Avg Recall: -nan, count: 0
3: 148.922562, 149.578415 avg, 0.000000 rate, 17.030993 seconds, 192 images

Could you make anything out of this?

@Arun-Trichy
Copy link

Hey... Got it resolved.. It is just that annotations are not properly done for the training data set.
In your case must be some images are not properly annotated.

@evenctit
Copy link

@Arun-Trichy could you help give more details how to resolve your issue? much thanks.

@ahsan856jalal
Copy link

I also had the same issue .
Just do one thing , load one of the annotated image and try to make bounding box using coordinate given in the text file . If it is on the object then it is fine, otherwise re-make all the text files with each row have this information:
class_label(0 - N-1) mid_x mid_y w h

where:
mid_x is the midpoint of the box's width = (x_initial+x_final)/(2img_width)
mid_y is the midpoint of the box's height= (y_initial+y_final)/(2
img_height)
w is the width of the box= w/img_width
h is the height of the box= h/img_height

@meowzhang
Copy link

@ahsan856jalal What is the meaning of class_label(0-N-1) or negative class number? Can I use positive number to replace it? Thanks.

@ahsan856jalal
Copy link

ahsan856jalal commented Mar 14, 2018 via email

@DonCorle0ne
Copy link

@Arun-Trichy I got the exact same error, how did you fix this?

@KelvinLin1016
Copy link

I got the same problem, I have all my images annotated properly

@chinmay5
Copy link

Is there some way to check if all my images have been annotated properly? Or is it just a shot in the dark?

@KelvinLin1016
Copy link

KelvinLin1016 commented May 15, 2018 via email

@chinmay5
Copy link

I am using this dataset .As you can see, it provides annotations for the images which I converted to the expected format using this link. The results I obtain look acceptable
image
but I am not yet sure if I screwed up with the conversion or something else

@KelvinLin1016
Copy link

KelvinLin1016 commented May 15, 2018 via email

@chinmay5
Copy link

Actually there are only 5 numbers. The screenshot also included "line number" in the text file by mistake. However, I agree that the last two numbers are wrong. But then, how can I correct those? I used the steps mentioned in the link and ran the same script. Can you help me out by suggesting if there are some alternative means of getting this done?

@chinmay5
Copy link

I have updated the annotations and still facing the same issue. Here is an example of the values obtained after changing the Annotations file:
image

Should I do something with the learning rate. Any sort of help shall be highly appreciated as I am completely stuck now.

@KelvinLin1016
Copy link

KelvinLin1016 commented May 16, 2018 via email

@chinmay5
Copy link

This is the config file I am using. Since I am trainig on a custom dataset, I am using this.

yolo-obj.cfg.txt

Also, although I keep getting NAN, the loss seems to be decreasing with iterations
image

@KelvinLin1016
Copy link

KelvinLin1016 commented May 16, 2018 via email

@chinmay5
Copy link

chinmay5 commented May 16, 2018

I did that but the same result :(
The only thing is, error seems to be decreasing but it still can't figure out any of the classes. The count remains at 0 as well

image

@KelvinLin1016
Copy link

KelvinLin1016 commented May 16, 2018 via email

@chinmay5
Copy link

./darknet detector train cfg/tsd.data cfg/yolo-obj.cfg darknet19_448.conv.23

Is what I use for training.
yolo-obj.cfg.txt
Also attaching the config file

@catalinolaru1
Copy link

Any update on this? @chinmay5

@chinmay5
Copy link

chinmay5 commented Jun 8, 2018

It works now. After few hundred iterations, things started giving values. I needed to correct the config file though.

@chinesh
Copy link

chinesh commented Jun 14, 2018

What changes did you do in config file @chinmay5

@danieltanasec
Copy link

I also have a similar problem. It shows:

Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Can't open label file. (This can be normal only if you use MSCOCO)
Region 16 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.004274, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 23 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.501661, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 16 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.004738, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 23 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.501672, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 16 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.004698, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 23 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.501650, .5R: -nan(ind), .75R: -nan(ind), count: 0

I tried everything... And I'm pretty sure the labels are correct. I used the COCO set from darknet website. I hope someone can help me with this

@renoldhuman
Copy link

@Tzuya14 Getting the exact same error besides the can't open label file, have you found a way to fix this?

@chinmay5 What changes did you make to your config file that eventually solved your problem?

@danieltanasec
Copy link

@renoldhuman No, I haven't unfortunately. It's a huge mystery for me. I tried same settings and label format for VOC dataset and it works perfectly.

@TiongSun
Copy link

@Tzuya14 @renoldhuman the following link solves my "Can't open label file. (This can be normal only if you use MSCOCO)"
#1027

for the "Region 16 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.004274, .5R: -nan(ind), .75R: -nan(ind), count: 0" problem, I am still searching for answers.

@TiongSun
Copy link

@Tzuya14 @renoldhuman my problem solved. You convert all images to .jpg. All run well! Hope it helps!

@danieltanasec
Copy link

danieltanasec commented Aug 20, 2018

@TiongSun @renoldhuman I solved my problem by putting the label files in the same directory as the images. I truly believed that you need to have 2 separated folders, "images" and labels" :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests