Merge branch 'master' of https://github.com/willxie/fast-rcnn

willxie · Oct 24, 2016 · 37808a7 · 37808a7
2 parents f1a7805 + 84f082d
commit 37808a7
Show file tree

Hide file tree

Showing 2 changed files with 1 addition and 0 deletions.
diff --git a/report.txt b/report.txt
@@ -0,0 +1 @@
+I used the 3band processed data provided as part of the dataset. I only kept the image chips that contain at least one building annotation, resulting in a total of 4388, 439 × 406 images. The dataset is then divided into 3:1:1 train, validation, and test set split (num_train: 2631, num_val: 877, num_test: 880).I used fast rcnn (https://github.com/rbgirshick/fast-rcnn) as the main algorithm implemented in Caffe framework with python bindings. Fast rcnn is an end-to-end, CNN based object detector that does both classification and bounding box regression in a single pass for multiple object candidates in the same image. I used the dlib implementation of selective search in place of the original matlab version to generate candidates during both training and testing. From my knowledge both methods are very similar with the dlib version returning fewer bounding boxes and run much faster.For training, I used CaffeNet architecture (Alexnet with minor changes that make it more optimized for the Caffe framework) with fast RCNN additional layers. The network was initialized with PASCAL dataset trained weights and fine-tuned with the train set described above. The last classification and regression layers were set to a magnitude higher learning rate than the rest. Mirrored training data augmentation was used. Bounding boxes were capped to 400 pixels in area (20 x 20). This number was a rough guestimate based on an average small house in the train set from my observation. Perhaps gathering distribution of the building sizes would yield better a better estimate. Most other settings were standard. The final model used for later was is trained for 90,000 iterations (~35 epochs). The initial results showed significantly lower recall than precision. After examining the outputs, I noticed that the model had detecting most buildings in high building density areas. I identified two hyper-parameters that attributes to this: non-max suppression (NMS) overlap threshold and candidate bounding box (BB) minimum threshold. NMS was optimized via observation testing on different values before significant drop in precision. For BB, I did a grid search on the confidence threshold and the BB threshold values and select the parameters that yielded the highest F1 score. The final recall and precision on the test set are 0.22 and 0.33. The number of true positives, false positives, and false negatives are 13146, 26644, and 47266 respectively (please see the results.xlsx file for more data).There are a lot of room for improvement. Firstly, the evaluation should be done by re-training the model with train and validation sets. Judging from the results with low confidence threshold, the still low recall reveals that the bottle neck could lie in selective search. The generated bounding boxes I suspect simply did not include all the buildings. Alternative object proposal algorithms should be tested. I opted for CaffeNet due to it's simplicity and ease of training. More advance archetitures are available (e.g. GoogleNet, RestNet) that should provide better accuracy. Similarly, datasets more diverse or closer related to arial images could work better than PASCAL as pre-training data. Lastly, plenty of domain specific heuristics could be used to filter out poor candidates.P.S. The dataset is partially labeled. This could make false positive and precision numbers misleading. I think if additional annotation is allowed for a human in the loop pipeline that feeds lower confidence detections for manual labeling and correction to enlarge the train set, a set of parameter that minimizes false positives would be prudent to minimize noise in the long run. P.P.S Most the files should be in ~/fast-rcnn directory in the AWS instance Nic lent me.Here are some key files and directories:- process dataset: tools/process_data.py- evaluation: tools/eval_spacenet.py- processed data: spacenet/data- test output images: spacenet/results/test/- model file: caffenetspacenet_fast_rcnn_iter_90000.caffemodel- data spreadsheet: result.xlsx

diff --git a/results.xlsx b/results.xlsx
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		I used the 3band processed data provided as part of the dataset. I only kept the image chips that contain at least one building annotation, resulting in a total of 4388, 439 × 406 images. The dataset is then divided into 3:1:1 train, validation, and test set split (num_train: 2631, num_val: 877, num_test: 880).I used fast rcnn (https://github.com/rbgirshick/fast-rcnn) as the main algorithm implemented in Caffe framework with python bindings. Fast rcnn is an end-to-end, CNN based object detector that does both classification and bounding box regression in a single pass for multiple object candidates in the same image. I used the dlib implementation of selective search in place of the original matlab version to generate candidates during both training and testing. From my knowledge both methods are very similar with the dlib version returning fewer bounding boxes and run much faster.For training, I used CaffeNet architecture (Alexnet with minor changes that make it more optimized for the Caffe framework) with fast RCNN additional layers. The network was initialized with PASCAL dataset trained weights and fine-tuned with the train set described above. The last classification and regression layers were set to a magnitude higher learning rate than the rest. Mirrored training data augmentation was used. Bounding boxes were capped to 400 pixels in area (20 x 20). This number was a rough guestimate based on an average small house in the train set from my observation. Perhaps gathering distribution of the building sizes would yield better a better estimate. Most other settings were standard. The final model used for later was is trained for 90,000 iterations (~35 epochs). The initial results showed significantly lower recall than precision. After examining the outputs, I noticed that the model had detecting most buildings in high building density areas. I identified two hyper-parameters that attributes to this: non-max suppression (NMS) overlap threshold and candidate bounding box (BB) minimum threshold. NMS was optimized via observation testing on different values before significant drop in precision. For BB, I did a grid search on the confidence threshold and the BB threshold values and select the parameters that yielded the highest F1 score. The final recall and precision on the test set are 0.22 and 0.33. The number of true positives, false positives, and false negatives are 13146, 26644, and 47266 respectively (please see the results.xlsx file for more data).There are a lot of room for improvement. Firstly, the evaluation should be done by re-training the model with train and validation sets. Judging from the results with low confidence threshold, the still low recall reveals that the bottle neck could lie in selective search. The generated bounding boxes I suspect simply did not include all the buildings. Alternative object proposal algorithms should be tested. I opted for CaffeNet due to it's simplicity and ease of training. More advance archetitures are available (e.g. GoogleNet, RestNet) that should provide better accuracy. Similarly, datasets more diverse or closer related to arial images could work better than PASCAL as pre-training data. Lastly, plenty of domain specific heuristics could be used to filter out poor candidates.P.S. The dataset is partially labeled. This could make false positive and precision numbers misleading. I think if additional annotation is allowed for a human in the loop pipeline that feeds lower confidence detections for manual labeling and correction to enlarge the train set, a set of parameter that minimizes false positives would be prudent to minimize noise in the long run. P.P.S Most the files should be in ~/fast-rcnn directory in the AWS instance Nic lent me.Here are some key files and directories:- process dataset: tools/process_data.py- evaluation: tools/eval_spacenet.py- processed data: spacenet/data- test output images: spacenet/results/test/- model file: caffenetspacenet_fast_rcnn_iter_90000.caffemodel- data spreadsheet: result.xlsx
Expand Down