This is the official repository for our CVPR 2024 paper RoDLA:Benchmarking the Robustness of Document Layout Analysis Models. For more result and benchmarking details, please visit our project homepage.
We introduce RoDLA that aims to benchmark the robustness of Document Layout Analysis (DLA) models. RoDLA is a large-scale benchmark that contains 450,000+ documents with diverse layouts and contents. We also provide a set of evaluation metrics to facilitate the comparison of different DLA models. We hope that RoDLA can serve as a standard benchmark for the robustness evaluation of DLA models.
- Perturbation Benchmark Dataset
- PubLayNet-P
- DocLayNet-P
- M6Doc-P
- Perturbation Generation and Evaluation Code
- RoDLA Model Checkpoints
- RoDLA Model Training Code
- RoDLA Model Evaluation Code
1. Clone the repository
git clone https://github.com/yufanchen96/RoDLA.git
cd RoDLA
2. Create a conda virtual environment
# create virtual environment
conda create -n RoDLA python=3.7 -y
conda activate RoDLA
3. Install benchmark dependencies
- Install Basic Dependencies
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
pip install -U openmim
mim install mmcv-full==1.5.0
pip install timm==0.6.11 mmdet==2.28.1
pip install Pillow==9.5.0
pip install opencv-python termcolor yacs pyyaml scipy
- Install ocrodeg Dependencies
git clone https://github.com/NVlabs/ocrodeg.git
cd ./ocrodeg
pip install -e .
- Compile CUDA operators
cd ./model/ops_dcnv3
sh ./make.sh
python test.py
-
You can also install the operator using .whl files
Download the RoDLA dataset from Google Driver to the desired root directory.
Prepare the dataset as follows by yourself:
cd ./perturbation
python apply_perturbation.py \
--dataset_dir ./publaynet/val \
--json_dir ./publaynet/val.json \
--dataset_name PubLayNet-P \
--output_dir ./PubLayNet-P \
--pert_method all \
--background_folder ./background \
--metric all
After dataset preparation, the perturbed dataset structure would be:
.desired_root
└── PubLayNet-P
├── Background
│ ├── Background_1
│ │ ├── psnr.json
│ │ ├── ms_ssim.json
│ │ ├── cw_ssim.json
│ │ ├── val.json
│ │ ├── val
│ │ │ ├── PMC538274_00004.jpg
...
│ ├── Background_2
...
├── Rotation
...
cd ./model
python -u test.py configs/publaynet/rodla_internimage_xl_publaynet.py \
checkpoint_dir/rodla_internimage_xl_publaynet.pth \
--work-dir result/rodla_internimage_publaynet/Speckle_1 \
--eval bbox \
--cfg-options data.test.ann_file='PubLayNet-P/Speckle/Speckle_1/val.json' \
data.test.img_prefix='PubLayNet-P/Speckle/Speckle_1/val/'
- Modify the configuration file under
configs/_base_/datasets
to specify the dataset path - Run the following command to train the model with 4 GPUs
sh dist_train.sh configs/publaynet/rodla_internimage_xl_2x_publaynet.py 4
If you find this code useful for your research, please consider citing:
@inproceedings{chen2024rodla,
title={RoDLA: Benchmarking the Robustness of Document Layout Analysis Models},
author={Yufan Chen and Jiaming Zhang and Kunyu Peng and Junwei Zheng and Ruiping Liu and Philip Torr and Rainer Stiefelhagen},
booktitle={CVPR},
year={2024}
}