Skip to content

[NeurIPS 2024 🔥] DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model

License

Notifications You must be signed in to change notification settings

CQU-ADHRI-Lab/DI-MaskDINO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zhixiong Nan, Xianghong Li, Tao Xiang*, Jifeng Dai

If you like our project, please give us a star ⭐ on GitHub for the latest update.

arXiv github License

DI-MaskDINO

This is the official implementation of the paper "DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model".


Motivated by an interesting phenomenon that the performance of object detection lags behind that of instance segmentation at the beignning decoder layer, DI-MaskDINO is proposed to alleviate the detection-segmentation imbalance issue. DI-MaskDINO is implemented by configuring proposed De-Imbalance (DI) module and Balance-Aware Tokens Optimization (BATO) module to MaskDINO. DI module contributes to strengthen the detection task at the beginning decoder layer to balance the performance of the two tasks, and the core of DI module is residual double-selection mechanism. DI-MaskDINO outperforms existing SOTA joint object detection and instance segmentation model MaskDINO (+1.2 AP box and +0.9 AP mask on COCO, using ResNet50 backbone with 12 training epochs), SOTA object detection model DINO (+1.0 AP box on COCO), and SOTA segmentation model Mask2Former(+3.0 AP mask on COCO).

Update

[2024/12] Code for DI-MaskDINO is available here!

[2024/9] DI-MaskDINO has been accepted at NeurIPS 2024 as a poster!

Installation

We tested our code with Python=3.7.16, PyTorch=1.9.0, CUDA=11.1. Please install PyTorch first according to official instructions. Our code is based on detectron2. Please refer to the installation of detectron2.

Example conda environment setup:

# Create a new virtual environment
conda create -n dimaskdino python=3.7
conda activate dimaskdino

# Install PyTorch
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia

# Install opencv
pip install opencv-python

# Install detectron2
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html

# Under your working directory
git clone https://github.com/CQU-ADHRI-Lab/DI-MaskDINO.git
cd DI-MaskDINO
pip install -r requirements.txt

# CUDA kernel for MSDeformAttn
cd dimaskdino/modeling/pixel_decoder/ops
sh make.sh

Models

Name Backbone Epochs APbox APmask Download
DI-MaskDINO ResNet50 12 46.9 42.3 model
DI-MaskDINO ResNet50 24 49.6 44.8 model
DI-MaskDINO ResNet50 50 51.9 46.7 model

Run

Training

Train DI-MaskDINO with 8 GPUs:

python train_net.py --num-gpus 8 --config-file configs/dimaskdino_r50_4scale_bs16_12ep.yaml OUTPUT_DIR /path/to/output

Evaluation

You can download our pretrained models and evaluate them with the following commands.

python train_net.py --eval-only --num-gpus 8 --config-file /path/to/config_file MODEL.WEIGHTS /path/to/checkpoint_file

For example, to reproduce our result, you can copy the config path from the model table, download the pretrained checkpoint into /path/to/checkpoint_file, and run

python train_net.py --eval-only --num-gpus 8 --config-file configs/dimaskdino_r50_4scale_bs16_12ep.yaml MODEL.WEIGHTS /path/to/checkpoint_file

Citing DI-MaskDINO

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{nan2024di,
  title={DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model}, 
  author={Zhixiong Nan and Xianghong Li and Tao Xiang and Jifeng Dai},
  booktitle={Proceedings of the Neural Information Processing Systems},
  year={2024}
}

Acknowledgement

Many thanks to these excellent opensource projects: