This repo is the official implementation of CVPR2023 paper "Semi-DETR: Semi-Supervised Object Detection with Detection Transformers". Semi-DETR is the first work on semi-supervised object detection designed for detection transformers.
Our code is based on the awesome codebase provided by Soft-Teacher[1].
Ubuntu 18.04
Anaconda3
withpython=3.8
Pytorch=1.9.0
mmdetection=2.16.0+fe46ffe
mmcv=1.3.16
cuda=10.2
Ths project is developed based on mmdetection, please install the mmdet in a editable mode first:
cd thirdparty/mmdetection && python -m pip install -e .
Following the mmdetection, we also develop our detection transformer module and semi-supervised module in the similar way, which needs to be installed first(Please change the module name('detr_od' and 'detr_ssod') in 'setup.py' file alter):
cd ../../ && python -m pip install -e .
These will install 'mmdet', 'detr_od' and 'detr_ssod' in our environment. It also needs to compile the CUDA ops for deformable attention:
cd detr_od/models/utils/ops
python setup.py build install
# unit test (should see all checking is True)(Optional)
python test.py
cd ../../..
- Download the COCO dataset
- Execute the following command to generate data set splits:
# YOUR_DATA should be a directory contains coco dataset.
# For eg.:
# YOUR_DATA/
# coco/
# train2017/
# val2017/
# unlabeled2017/
# annotations/
ln -s ${YOUR_DATA} data
bash tools/dataset/prepare_coco_data.sh conduct
For concrete instructions of what should be downloaded, please refer to tools/dataset/prepare_coco_data.sh
line 11-24
. You can also download our generated semi-supervised data set splits in semi-coco-splits.
- Download the PASCAL VOC dataset
- Execute the following command to generate data set splits:
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar
tar -xf VOCtrainval_11-May-2012.tar
# resulting format
# YOUR_DATA/
# - VOCdevkit
# - VOC2007
# - Annotations
# - JPEGImages
# - ...
# - VOC2012
# - Annotations
# - JPEGImages
# - ...
Following prior works, we convert the PASCAL VOC dataset into COCO format and evaluate the performance of model with coco-style mAP. Execute the following command to convert the dataset format:
python scripts/voc_to_coco.py --devkit_path ${VOCdevkit-PATH} --out-dir ${VOCdevkit-PATH}
- To train model on the fully supervised setting(Optional):
We implement the DINO with mmdetection following the original official repo, if you want to train the fully supervised DINO model by youself and check our implementation, you can run:
sh tools/dist_train_detr_od.sh dino_detr 8
It would train the DINO with batch size 16 for 12 epochs. We also provide the resulted checkpoint dino_sup_12e_ckpt and our training log dino_sup_12e_log of this fully supervised model.
- To train model on the partial labeled data setting:
sh tools/dist_train_detr_ssod.sh dino_detr_ssod ${FOLD} ${PERCENT} ${GPUS}
For example, you can run the following scripts to train our model on 10% labeled data with 8 GPUs on 1th split:
sh tools/dist_train_detr_ssod.sh dino_detr_ssod 1 10 8
- To train model on the full labeled data setting:
sh tools/dist_train_detr_ssod_coco_full.sh <NUM_GPUS>
For example, to train ours R50
model with 8 GPUs:
sh tools/dist_train_detr_ssod_coco_full.sh 8
python tools/test.py <CONFIG_FILE_PATH> <CHECKPOINT_PATH> --eval bbox
We also prepare some models trained by us bellow:
Setting | mAP | Weights |
---|---|---|
1% Data | 30.50 |
ckpt |
5% Data | 40.10 |
ckpt |
10% Data | 43.5 |
ckpt |
Full Data | 50.5 | ckpt |
Setting | AP50 | mAP | Weights |
---|---|---|---|
Unlabel: VOC12 | 86.1 | 65.2 | ckpt |
[1] End-to-End Semi-Supervised Object Detection with Soft Teacher
If you find our repo useful for your research, please cite us:
@inproceedings{zhang2023semi,
title={Semi-DETR: Semi-Supervised Object Detection With Detection Transformers},
author={Zhang, Jiacheng and Lin, Xiangru and Zhang, Wei and Wang, Kuo and Tan, Xiao and Han, Junyu and Ding, Errui and Wang, Jingdong and Li, Guanbin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={23809--23818},
year={2023}
}