This code allows mini-batch greater than 1, not provided in original py-faster-rcnn implementation. Code has been validated to perform as expected (ToConcoctPellucid @ ILSVRC16).
ImageNet images are scaled to 480x480 and allows mini-batch = 4 with ResNet-50 on Nvidia Titan X (12 GB).
Same cannot be done for COCO, why? because COCO dataset has really small objects and a higher resolution is required.
Check this commit for necessary changes and let me know for improvements.
.. ice-pice
Hope this tutorial will help you a lot! and I appreciate that you can star it!
└─── LSVRC2013_DET_val
│ *.JPEG (Image files, ex:ILSVRC2013_val_00000565.JPEG)
└─── data
│ meta_det.mat
└─── det_lists
│ val1.txt, val2.txt
Load the meta_det.mat file by
classes = sio.loadmat(os.path.join(self._devkit_path, 'data', 'meta_det.mat'))
There's are several file you need to modify.
This file is in the directory $FRCNN_ROOT/lib/datasets($FRCNN_ROOT is the where your faster rcnn locate) and is called by
It is the interface loading the imdb file.
for split in ['train', 'val', 'val1', 'val2', 'test']:
name = 'imagenet_{}'.format(split)
devkit_path = '/media/VSlab2/imagenet/ILSVRC13'
__sets[name] = (lambda split=split, devkit_path=devkit_path:datasets.imagenet.imagenet(split,devkit_path))
we have to enlarge the number of category from 20+1 into 200+1 categories. Note that in imagenet dataset, the object category is something like "n02691156", instead of "airplane"
self._data_path = os.path.join(self._devkit_path, 'ILSVRC2013_DET_' + self._image_set[:-1])
synsets = sio.loadmat(os.path.join(self._devkit_path, 'data', 'meta_det.mat'))
self._classes = ('__background__',)
self._wnid = (0,)
for i in xrange(200):
self._classes = self._classes + (synsets['synsets'][0][i][2][0],)
self._wnid = self._wnid + (synsets['synsets'][0][i][1][0],)
self._wnid_to_ind = dict(zip(self._wnid, xrange(self.num_classes)))
self._class_to_ind = dict(zip(self.classes, xrange(self.num_classes)))
self._class denotes the class name
self._wnid denotes the id of the category
This is because in the pascal voc dataset, all coordinates start from one, so in order to make them start from 0, we need to minus 1. But this is not true for imagenet, so we should not minus 1.
So we need to modify these lines to:
for ix, obj in enumerate(objs):
x1 = float(get_data_from_tag(obj, 'xmin'))
y1 = float(get_data_from_tag(obj, 'ymin'))
x2 = float(get_data_from_tag(obj, 'xmax'))
y2 = float(get_data_from_tag(obj, 'ymax'))
cls = self._wnid_to_ind[str(get_data_from_tag(obj, "name")).lower().strip()]
Noted that in faster rcnnn, we don't need to run the selective-search, which is the main difference from fast rcnn.
Under the directory $FRCNN_ROOT/
Change the number of classes into 200+1
param_str: "'num_classes': 201"
In layer "bbox_pred", change the number of output into (200+1)*4
num_output: 804
You can modify the test.prototxt in the same way.
Under the dircetory $FRCNN_ROOT/experiments/scripts
You can specify which dataset to train/test on and your what pre-trainded model is
Run the $FRCNN/experiments/scripts/
The use of .sh file is just the same as the original faster rcnn
Just run the to visualize pictures!
Original video ""
How to train fast rcnn on imagenet
If you have any advance question, feel free to contact me by [email protected]