Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training SEE on ICDAR Born Digital Dataset #25

Open
rohit12 opened this issue Apr 13, 2018 · 4 comments
Open

Training SEE on ICDAR Born Digital Dataset #25

rohit12 opened this issue Apr 13, 2018 · 4 comments

Comments

@rohit12
Copy link

rohit12 commented Apr 13, 2018

I have a few queries while training SEE on the Born Digital dataset. It is basically flyers and digitally made advertisements.

  1. How do I verify whether training is going correctly? What method did you use for this?
  2. How do I ensure that since there are multiple GT in a single image, that the network is correctly associating GTs with what it is detecting?
  3. In the logs/ folder, the model is not generated. Do you have any idea why? I have 410 images for training and my batch size is 32.
  4. Related to the previous query, is a size of 410 too small considering the parameters of the network?
@saq1410
Copy link

saq1410 commented Apr 16, 2018

@rohit12 the reason your model file not generated is you may have not set --snapshot-interval argument. By default it is set to 20000. so when your script reaches total of 20000 iteration a snapshot will be generated.

@Bartzi
Copy link
Owner

Bartzi commented Apr 16, 2018

Hi,

  1. You can verify that training is going correctly, by having a look at the images located in the bboxes folder in your log_dir. Those images show the predictions of the network on a test image and if that improves over time, it seems to work
  2. The only way to ensure this right now, is to order the GT in a consistent way for each image. Otherwise you will need to find a loss that can work with random alignment. (we always forced the GT to be ordered from left to right and top to bottom)
  3. @saq1410 you are right, that is the problem here
  4. I think 410 images is way to small. There are too many parameters that need to be optimized and the task is not easy at all, so it will be more than difficult for the network to learn. Your network should also overfit heavily on this limited dataset. Maybe you can find a way to generate similar looking data...

@saharudra
Copy link

Hi Christian,

We are facing a few other problems for the Born Digital dataset.

  1. While creating the the video, we are facing the following error
/src/datasets/BornDigital/logs_new/2018-04-17T02:31:56.662657_training/boxes$ python3 ../../../../../see/utils/create_video.py ./ ./video.mp
4                                                                                                                                                                   
loading images                                                                                                                                                      
sort and cut images                                                                                                                                                 
creating temp file                                                                                                                                                  
convert -quality 100 @/tmp/tmpp5m2rc0n /tmp/tmp65e4ijjz/1000.mpeg                                                                                                   
^BdKilled                                                                                                                                                           
Traceback (most recent call last):                                                                                                                                  
  File "../../../../../see/utils/create_video.py", line 109, in <module>                                                                                            
    make_video(args.image_dir, args.dest_file, batch_size=args.batch_size, start=args.start, end=args.end, pattern=args.pattern)                                    
  File "../../../../../see/utils/create_video.py", line 56, in make_video                                                                                           
    temp_file = create_video(i, temp_file, video_dir)                                                                                                               
  File "../../../../../see/utils/create_video.py", line 92, in create_video                                                                                         
    subprocess.run(' '.join(process_args), shell=True, check=True)                                                                                                  
  File "/usr/lib/python3.5/subprocess.py", line 708, in run                                                                                                         
    output=stdout, stderr=stderr)                                                                                                                                   
subprocess.CalledProcessError: Command 'convert -quality 100 @/tmp/tmpp5m2rc0n /tmp/tmp65e4ijjz/1000.mpeg' returned non-zero exit status 137
  1. How to interpret the images stored in the boxes folder in the logs. For the Born Digital dataset, the following are a few of the examples over training.

1.png
1

10.png
10

100.png
100

500.png
500

1250.png
1250

Are these images the visualization of what region on the input image, the current layer is being focused on with the first being the focus of the output layer?

  1. Do you have any suggestions for some other ground truth format? We want to look into the google 1000 dataset but converting them to the format that you have used in the code seems to be a bit of time consuming task.

@Bartzi
Copy link
Owner

Bartzi commented Apr 18, 2018

alright:

  1. I think its not working because the images could be too large (meaning width and/or height), to be fit into a video container. You could set the keyword argument render_extracted_rois to False in the part of the code that creates the BBOXPlotter object (in the train_.. file you are using). This will create smaller images. See the next bullet point for an explanation of what I mean with that.
  2. The images have to be interpreted in the following way:
  • the top-left image shows the input image with the predicted bboxes on it
  • all the other images in the top row show each individual region crop that has been extracted from the original input image, at the location of the predicted bbox (once you set render_extracted_rois to False these images will not be rendered anymore.
  • the bottom row shows the output of visual backprop for this specific image on top.
  1. You can choose the groundtruth format any way you like! You will just need to create a new dataset object for that and use it instead of the ones I created. In this object you can parse your groundtruth and supply it to the network as a numpy-array.

The images you posted seem to show that your network is hardly learning anything right now. I'd advise you to take a curriculum approach and start with easy samples (sampes with few words) first and the increase difficulty, otherwise it might not converge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants