Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
mmcv_custom	mmcv_custom
mmdet_custom	mmdet_custom
ops	ops
README.md	README.md
dist_test.sh	dist_test.sh
dist_train.sh	dist_train.sh
slurm_test.sh	slurm_test.sh
slurm_train.sh	slurm_train.sh
test.py	test.py
train.py	train.py

Applying ViT-Adapter to Object Detection

Our detection code is developed on top of MMDetection v2.23.0.

For details see Vision Transformer Adapter for Dense Predictions.

If you use this code for a paper please cite:

@article{chen2021vitadapter,
  title={Vision Transformer Adapter for Dense Predictions},
  author={Chen, Zhe and Duan, Yuchen and Wang, Wenhai and He, Junjun and Lu, Tong and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2205.08534},
  year={2022}
}

Usage

Install MMDetection v2.23.0.

cd ops & sh make.sh # compile deformable attention
pip install timm==0.4.12
pip install mmdet==2.23.0
# recommended environment: torch1.9 + cuda11.1
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install instaboostfast # for htc++

Data preparation

Prepare COCO according to the guidelines in MMDetection v2.23.0.

Results and models

ViT-Adapter on COCO test-dev

HTC++

Method	Backbone	Pre-train	Lr schd	box AP	mask AP	#Param	Config	Download
HTC++	ViT-Adapter-L	BEiT-L	3x	58.5	50.8	401M	config	model
HTC++	ViT-Adapter-L (MS)	BEiT-L	3x	60.1	52.1	401M	TODO	-

ViT-Adapter on COCO minival

HTC++

Method	Backbone	Pre-train	Lr schd	box AP	mask AP	#Param	Config	Download
HTC++	ViT-Adapter-L	BEiT-L	3x	57.9	50.2	401M	config	model
HTC++	ViT-Adapter-L (MS)	BEiT-L	3x	59.8	51.7	401M	TODO	-

Baseline Detectors

Method	Backbone	Pre-train	Lr schd	Aug	box AP	mask AP	#Param	Config	Download
Mask R-CNN	ViT-Adapter-T	DeiT-T	3x	Yes	46.0	41.0	28M	config	model
Mask R-CNN	ViT-Adapter-S	DeiT-S	3x	Yes	48.2	42.8	48M	config	model
Mask R-CNN	ViT-Adapter-B	DeiT-B	3x	Yes	49.6	43.6	120M	config	model
Mask R-CNN	ViT-Adapter-B	Uni-Perceiver	3x	Yes	50.7	44.9	120M	config	model
Mask R-CNN	ViT-Adapter-L	AugReg	3x	Yes	50.9	44.8	348M	config	model

Advanced Detectors

Method	Framework	Pre-train	Lr schd	Aug	box AP	mask AP	#Param	Config	Download
ViT-Adapter-S	Cascade Mask R-CNN	DeiT-S	3x	Yes	51.5	44.5	86M	config	model
ViT-Adapter-S	ATSS	DeiT-S	3x	Yes	49.6	-	36M	config	model
ViT-Adapter-S	GFL	DeiT-S	3x	Yes	50.0	-	36M	config	model
ViT-Adapter-S	Sparse R-CNN	DeiT-S	3x	Yes	48.1	-	110M	config	model
ViT-Adapter-B	Upgraded Mask R-CNN	MAE	25ep	LSJ	50.3	44.7	122M	config	model
ViT-Adapter-B	Upgraded Mask R-CNN	MAE	50ep	LSJ	50.8	45.1	122M	config	model

Evaluation

To evaluate ViT-Adapter-L + HTC++ on COCO val2017 on a single node with 8 gpus run:

sh dist_test.sh configs/htc++/htc++_beit_adapter_large_fpn_3x_coco.py /path/to/checkpoint_file 8 --eval bbox segm

This should give

Evaluate annotation type *bbox*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.579
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.766
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.635
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.436
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.616
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.726
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.736
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.736
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.736
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.608
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.768
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.863

Evaluate annotation type *segm*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.502
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.744
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.549
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.328
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.533
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.683
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.499
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.669
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.776

Training

To train ViT-Adapter-T + Mask R-CNN on COCO train2017 on a single node with 8 gpus for 36 epochs run:

sh dist_train.sh configs/mask_rcnn/mask_rcnn_deit_adapter_tiny_fpn_3x_coco.py 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detection

detection

README.md

Applying ViT-Adapter to Object Detection

Usage

Data preparation

Results and models

ViT-Adapter on COCO test-dev

ViT-Adapter on COCO minival

Evaluation

Training

Files

detection

Directory actions

More options

Directory actions

More options

Latest commit

History

detection

Folders and files

parent directory

README.md

Applying ViT-Adapter to Object Detection

Usage

Data preparation

Results and models

ViT-Adapter on COCO test-dev

ViT-Adapter on COCO minival

Evaluation

Training