forked from open-mmlab/mmdetection
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] Support QueryInst (open-mmlab#6050)
* impl queryinst * bug free queryinst with crop and negative samples * use detr hyperparameters * pre-commit hooks * modified dynamic_mask_head docstrings * remove unused dropout in dynamic_mask_head * add docstring for dice_loss * add dice_loss unit test * impl unit test for dynamic_mask_head * update queryinst docstring and implementation * stability update for dice_loss and dynamic_mask_head * update for clarify * bug free in case of num_proposals equal to zero * detail docstrings * fixed CI issues * issues resolved * add queryinst docs
- Loading branch information
Showing
23 changed files
with
724 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Instances as Queries | ||
|
||
## Introduction | ||
|
||
<!-- [ALGORITHM] --> | ||
|
||
``` | ||
@InProceedings{Fang_2021_ICCV, | ||
author = {Fang, Yuxin and Yang, Shusheng and Wang, Xinggang and Li, Yu and Fang, Chen and Shan, Ying and Feng, Bin and Liu, Wenyu}, | ||
title = {Instances As Queries}, | ||
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, | ||
month = {October}, | ||
year = {2021}, | ||
pages = {6910-6919} | ||
} | ||
``` | ||
|
||
## Results and Models | ||
|
||
| Model | Backbone | Style | Lr schd | Number of Proposals |Multi-Scale| RandomCrop | box AP | mask AP | Config | Download | | ||
|:------------:|:---------:|:-------:|:-------:|:-------: |:-------: |:---------:|:------:|:------:|:------:|:--------:| | ||
| QueryInst | R-50-FPN | pytorch | 1x | 100 | False | False | 42.0 | 37.5 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/queryinst/queryinst_r50_fpn_1x_coco.py) | [model]() | [log]() | | ||
| QueryInst | R-50-FPN | pytorch | 3x | 100 | True | False | 44.8 | 39.8 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/queryinst/queryinst_r50_fpn_mstrain_480-800_3x_coco.py) | [model]() | [log]() | | ||
| QueryInst | R-50-FPN | pytorch | 3x | 300 | True | True | 47.5 | 41.7 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/queryinst/queryinst_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco.py) | [model]() | [log]() | | ||
| QueryInst | R-101-FPN | pytorch | 3x | 100 | True | False | 46.4 | 41.0 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/queryinst/queryinst_r101_fpn_mstrain_480-800_3x_coco.py) | [model]() | [log]() | | ||
| QueryInst | R-101-FPN | pytorch | 3x | 300 | True | True | 49.0 | 42.9 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/queryinst/queryinst_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco.py) | [model]() | [log]() | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
Collections: | ||
- Name: QueryInst | ||
Metadata: | ||
Training Data: COCO | ||
Training Techniques: | ||
- AdamW | ||
- Weight Decay | ||
Training Resources: 8x V100 GPUs | ||
Architecture: | ||
- FPN | ||
- ResNet | ||
- QueryInst | ||
Paper: | ||
URL: https://openaccess.thecvf.com/content/ICCV2021/papers/Fang_Instances_As_Queries_ICCV_2021_paper.pdf | ||
Title: 'Instances as Queries' | ||
README: configs/queryinst/README.md | ||
Code: | ||
URL: https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/detectors/queryinst.py | ||
Version: v2.18.0 | ||
|
||
Models: | ||
- Name: queryinst_r50_fpn_1x_coco | ||
In Collection: QueryInst | ||
Config: configs/queryinst/queryinst_r50_fpn_1x_coco.py | ||
Metadata: | ||
Epochs: 12 | ||
Results: | ||
- Task: Object Detection | ||
Dataset: COCO | ||
Metrics: | ||
box AP: 42.0 | ||
- Task: Instance Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
mask AP: 37.5 | ||
Weights: | ||
|
||
- Name: queryinst_r50_fpn_mstrain_480-800_3x_coco | ||
In Collection: QueryInst | ||
Config: configs/queryinst/queryinst_r50_fpn_mstrain_480-800_3x_coco.py | ||
Metadata: | ||
Epochs: 36 | ||
Results: | ||
- Task: Object Detection | ||
Dataset: COCO | ||
Metrics: | ||
box AP: 44.8 | ||
- Task: Instance Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
mask AP: 39.8 | ||
Weights: | ||
|
||
- Name: queryinst_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco | ||
In Collection: QueryInst | ||
Config: configs/queryinst/queryinst_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco.py | ||
Metadata: | ||
Epochs: 36 | ||
Results: | ||
- Task: Object Detection | ||
Dataset: COCO | ||
Metrics: | ||
box AP: 47.5 | ||
- Task: Instance Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
mask AP: 41.7 | ||
Weights: | ||
|
||
- Name: queryinst_r101_fpn_mstrain_480-800_3x_coco | ||
In Collection: QueryInst | ||
Config: configs/queryinst/queryinst_r101_fpn_mstrain_480-800_3x_coco.py | ||
Metadata: | ||
Epochs: 36 | ||
Results: | ||
- Task: Object Detection | ||
Dataset: COCO | ||
Metrics: | ||
box AP: 46.4 | ||
- Task: Instance Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
mask AP: 41.0 | ||
Weights: | ||
|
||
- Name: queryinst_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco | ||
In Collection: QueryInst | ||
Config: configs/queryinst/queryinst_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco.py | ||
Metadata: | ||
Epochs: 36 | ||
Results: | ||
- Task: Object Detection | ||
Dataset: COCO | ||
Metrics: | ||
box AP: 49.0 | ||
- Task: Instance Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
mask AP: 42.9 | ||
Weights: |
7 changes: 7 additions & 0 deletions
7
configs/queryinst/queryinst_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
_base_ = './queryinst_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco.py' | ||
|
||
model = dict( | ||
backbone=dict( | ||
depth=101, | ||
init_cfg=dict(type='Pretrained', | ||
checkpoint='torchvision://resnet101'))) |
7 changes: 7 additions & 0 deletions
7
configs/queryinst/queryinst_r101_fpn_mstrain_480-800_3x_coco.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
_base_ = './queryinst_r50_fpn_mstrain_480-800_3x_coco.py' | ||
|
||
model = dict( | ||
backbone=dict( | ||
depth=101, | ||
init_cfg=dict(type='Pretrained', | ||
checkpoint='torchvision://resnet101'))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
_base_ = [ | ||
'../_base_/datasets/coco_instance.py', | ||
'../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' | ||
] | ||
num_stages = 6 | ||
num_proposals = 100 | ||
model = dict( | ||
type='QueryInst', | ||
backbone=dict( | ||
type='ResNet', | ||
depth=50, | ||
num_stages=4, | ||
out_indices=(0, 1, 2, 3), | ||
frozen_stages=1, | ||
norm_cfg=dict(type='BN', requires_grad=True), | ||
norm_eval=True, | ||
style='pytorch', | ||
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), | ||
neck=dict( | ||
type='FPN', | ||
in_channels=[256, 512, 1024, 2048], | ||
out_channels=256, | ||
start_level=0, | ||
add_extra_convs='on_input', | ||
num_outs=4), | ||
rpn_head=dict( | ||
type='EmbeddingRPNHead', | ||
num_proposals=num_proposals, | ||
proposal_feature_channel=256), | ||
roi_head=dict( | ||
type='SparseRoIHead', | ||
num_stages=num_stages, | ||
stage_loss_weights=[1] * num_stages, | ||
proposal_feature_channel=256, | ||
bbox_roi_extractor=dict( | ||
type='SingleRoIExtractor', | ||
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), | ||
out_channels=256, | ||
featmap_strides=[4, 8, 16, 32]), | ||
mask_roi_extractor=dict( | ||
type='SingleRoIExtractor', | ||
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2), | ||
out_channels=256, | ||
featmap_strides=[4, 8, 16, 32]), | ||
bbox_head=[ | ||
dict( | ||
type='DIIHead', | ||
num_classes=80, | ||
num_ffn_fcs=2, | ||
num_heads=8, | ||
num_cls_fcs=1, | ||
num_reg_fcs=3, | ||
feedforward_channels=2048, | ||
in_channels=256, | ||
dropout=0.0, | ||
ffn_act_cfg=dict(type='ReLU', inplace=True), | ||
dynamic_conv_cfg=dict( | ||
type='DynamicConv', | ||
in_channels=256, | ||
feat_channels=64, | ||
out_channels=256, | ||
input_feat_shape=7, | ||
act_cfg=dict(type='ReLU', inplace=True), | ||
norm_cfg=dict(type='LN')), | ||
loss_bbox=dict(type='L1Loss', loss_weight=5.0), | ||
loss_iou=dict(type='GIoULoss', loss_weight=2.0), | ||
loss_cls=dict( | ||
type='FocalLoss', | ||
use_sigmoid=True, | ||
gamma=2.0, | ||
alpha=0.25, | ||
loss_weight=2.0), | ||
bbox_coder=dict( | ||
type='DeltaXYWHBBoxCoder', | ||
clip_border=False, | ||
target_means=[0., 0., 0., 0.], | ||
target_stds=[0.5, 0.5, 1., 1.])) for _ in range(num_stages) | ||
], | ||
mask_head=[ | ||
dict( | ||
type='DynamicMaskHead', | ||
dynamic_conv_cfg=dict( | ||
type='DynamicConv', | ||
in_channels=256, | ||
feat_channels=64, | ||
out_channels=256, | ||
input_feat_shape=14, | ||
with_proj=False, | ||
act_cfg=dict(type='ReLU', inplace=True), | ||
norm_cfg=dict(type='LN')), | ||
num_convs=4, | ||
num_classes=80, | ||
roi_feat_size=14, | ||
in_channels=256, | ||
conv_kernel_size=3, | ||
conv_out_channels=256, | ||
class_agnostic=False, | ||
norm_cfg=dict(type='BN'), | ||
upsample_cfg=dict(type='deconv', scale_factor=2), | ||
loss_mask=dict( | ||
type='DiceLoss', | ||
loss_weight=8.0, | ||
use_sigmoid=True, | ||
activate=False, | ||
eps=1e-5)) for _ in range(num_stages) | ||
]), | ||
# training and testing settings | ||
train_cfg=dict( | ||
rpn=None, | ||
rcnn=[ | ||
dict( | ||
assigner=dict( | ||
type='HungarianAssigner', | ||
cls_cost=dict(type='FocalLossCost', weight=2.0), | ||
reg_cost=dict(type='BBoxL1Cost', weight=5.0), | ||
iou_cost=dict(type='IoUCost', iou_mode='giou', | ||
weight=2.0)), | ||
sampler=dict(type='PseudoSampler'), | ||
pos_weight=1, | ||
mask_size=28, | ||
) for _ in range(num_stages) | ||
]), | ||
test_cfg=dict( | ||
rpn=None, rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5))) | ||
|
||
# optimizer | ||
optimizer = dict( | ||
_delete_=True, | ||
type='AdamW', | ||
lr=0.0001, | ||
weight_decay=0.0001, | ||
paramwise_cfg=dict( | ||
custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0)})) | ||
optimizer_config = dict( | ||
_delete_=True, grad_clip=dict(max_norm=0.1, norm_type=2)) | ||
# learning policy | ||
lr_config = dict(policy='step', step=[8, 11], warmup_iters=1000) | ||
runner = dict(type='EpochBasedRunner', max_epochs=12) |
54 changes: 54 additions & 0 deletions
54
configs/queryinst/queryinst_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
_base_ = './queryinst_r50_fpn_mstrain_480-800_3x_coco.py' | ||
num_proposals = 300 | ||
model = dict( | ||
rpn_head=dict(num_proposals=num_proposals), | ||
test_cfg=dict( | ||
_delete_=True, | ||
rpn=None, | ||
rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5))) | ||
img_norm_cfg = dict( | ||
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) | ||
|
||
# augmentation strategy originates from DETR. | ||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='LoadAnnotations', with_bbox=True, with_mask=True), | ||
dict(type='RandomFlip', flip_ratio=0.5), | ||
dict( | ||
type='AutoAugment', | ||
policies=[[ | ||
dict( | ||
type='Resize', | ||
img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), | ||
(608, 1333), (640, 1333), (672, 1333), (704, 1333), | ||
(736, 1333), (768, 1333), (800, 1333)], | ||
multiscale_mode='value', | ||
keep_ratio=True) | ||
], | ||
[ | ||
dict( | ||
type='Resize', | ||
img_scale=[(400, 1333), (500, 1333), (600, 1333)], | ||
multiscale_mode='value', | ||
keep_ratio=True), | ||
dict( | ||
type='RandomCrop', | ||
crop_type='absolute_range', | ||
crop_size=(384, 600), | ||
allow_negative_crop=True), | ||
dict( | ||
type='Resize', | ||
img_scale=[(480, 1333), (512, 1333), (544, 1333), | ||
(576, 1333), (608, 1333), (640, 1333), | ||
(672, 1333), (704, 1333), (736, 1333), | ||
(768, 1333), (800, 1333)], | ||
multiscale_mode='value', | ||
override=True, | ||
keep_ratio=True) | ||
]]), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='DefaultFormatBundle'), | ||
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) | ||
] | ||
data = dict(train=dict(pipeline=train_pipeline)) |
23 changes: 23 additions & 0 deletions
23
configs/queryinst/queryinst_r50_fpn_mstrain_480-800_3x_coco.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
_base_ = './queryinst_r50_fpn_1x_coco.py' | ||
|
||
img_norm_cfg = dict( | ||
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) | ||
min_values = (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800) | ||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='LoadAnnotations', with_bbox=True, with_mask=True), | ||
dict( | ||
type='Resize', | ||
img_scale=[(1333, value) for value in min_values], | ||
multiscale_mode='value', | ||
keep_ratio=True), | ||
dict(type='RandomFlip', flip_ratio=0.5), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='DefaultFormatBundle'), | ||
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) | ||
] | ||
|
||
data = dict(train=dict(pipeline=train_pipeline)) | ||
lr_config = dict(policy='step', step=[27, 33]) | ||
runner = dict(type='EpochBasedRunner', max_epochs=36) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.