This is the official implementation of the paper "DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model".
Motivated by an interesting phenomenon that the performance of object detection lags behind that of instance segmentation at the beignning decoder layer, DI-MaskDINO is proposed to alleviate the detection-segmentation imbalance issue. DI-MaskDINO is implemented by configuring proposed De-Imbalance (DI) module and Balance-Aware Tokens Optimization (BATO) module to MaskDINO. DI module contributes to strengthen the detection task at the beginning decoder layer to balance the performance of the two tasks, and the core of DI module is residual double-selection mechanism. DI-MaskDINO outperforms existing SOTA joint object detection and instance segmentation model MaskDINO (+1.2 AP box and +0.9 AP mask on COCO, using ResNet50 backbone with 12 training epochs), SOTA object detection model DINO (+1.0 AP box on COCO), and SOTA segmentation model Mask2Former(+3.0 AP mask on COCO).
[2024/12] Code for DI-MaskDINO is available here!
[2024/9] DI-MaskDINO has been accepted at NeurIPS 2024 as a poster!
We tested our code with Python=3.7.16, PyTorch=1.9.0, CUDA=11.1
. Please install PyTorch first according to official instructions. Our code is based on detectron2. Please refer to the installation of detectron2.
Example conda environment setup:
# Create a new virtual environment
conda create -n dimaskdino python=3.7
conda activate dimaskdino
# Install PyTorch
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
# Install opencv
pip install opencv-python
# Install detectron2
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
# Under your working directory
git clone https://github.com/CQU-ADHRI-Lab/DI-MaskDINO.git
cd DI-MaskDINO
pip install -r requirements.txt
# CUDA kernel for MSDeformAttn
cd dimaskdino/modeling/pixel_decoder/ops
sh make.sh
Name | Backbone | Epochs | APbox | APmask | Download |
---|---|---|---|---|---|
DI-MaskDINO | ResNet50 | 12 | 46.9 | 42.3 | model |
DI-MaskDINO | ResNet50 | 24 | 49.6 | 44.8 | model |
DI-MaskDINO | ResNet50 | 50 | 51.9 | 46.7 | model |
Train DI-MaskDINO with 8 GPUs:
python train_net.py --num-gpus 8 --config-file configs/dimaskdino_r50_4scale_bs16_12ep.yaml OUTPUT_DIR /path/to/output
You can download our pretrained models and evaluate them with the following commands.
python train_net.py --eval-only --num-gpus 8 --config-file /path/to/config_file MODEL.WEIGHTS /path/to/checkpoint_file
For example, to reproduce our result, you can copy the config path from the model table, download the pretrained checkpoint into /path/to/checkpoint_file
, and run
python train_net.py --eval-only --num-gpus 8 --config-file configs/dimaskdino_r50_4scale_bs16_12ep.yaml MODEL.WEIGHTS /path/to/checkpoint_file
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@inproceedings{nan2024di,
title={DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model},
author={Zhixiong Nan and Xianghong Li and Tao Xiang and Jifeng Dai},
booktitle={Proceedings of the Neural Information Processing Systems},
year={2024}
}
Many thanks to these excellent opensource projects: