Skip to content

Latest commit

 

History

History
228 lines (163 loc) · 11.4 KB

README.md

File metadata and controls

228 lines (163 loc) · 11.4 KB

English | 中文

PSENet

PSENet: Shape Robust Text Detection With Progressive Scale Expansion Network

1. Introduction

PSENet

PSENet is a text detection algorithm based on semantic segmentation. It can precisely locate text instances with arbitrary shapes, while most anchor-based algorithms cannot be used to detect text instances with arbitrary shapes. Also, two texts that are close to each other may cause the model to make wrong predictions. Therefore, in order to solve the above problems, PSENet also proposes a Progressive Scale Expansion (PSE) algorithm, which can successfully identify adjacent text instances[1]。

Figure 1. Overall PSENet architecture

Figure 1. Overall PSENet architecture

The overall architecture of PSENet is presented in Figure 1. It consists of multiple stages:

  1. Feature extraction from a backbone at different scales. ResNet is used as a backbone, and features are extracted from stages 2, 3, 4 and 5.
  2. The FPN network will then use the extracted features to produce new features of different scales and then concatenate them.
  3. Use the features of the second stage to generate the final segmentation result using the PSE algorithm, and generate text bounding boxes.

2. Results

ICDAR2015

Model Context Backbone Pretrained Recall Precision F-score Train T. ms/step Throughput Recipe Download
PSENet D910x8-MS2.0-G ResNet-152 ImageNet 79.39% 84.91% 82.06% 11.544 s/epoch 769.6 83.16 img/s yaml ckpt | mindir
PSENet D910x8-MS2.0-G ResNet-50 ImageNet 76.75% 86.58% 81.37% 4.562 s/epoch 304.138 210.43 img/s yaml ckpt | mindir
PSENet D910x8-MS2.0-G MobileNetV3 ImageNet 73.52% 67.84% 70.56% 2.604 s/epoch 173.604 368.66 img/s yaml ckpt | mindir

SCUT-CTW1500

Model Context Backbone Pretrained Recall Precision F-score Train T. ms/step Throughput Recipe Download
PSENet D910x8-MS2.0-G ResNet-152 ImageNet 73.69% 74.38% 74.04% 67 s/epoch 4466.67 14.33 img/s yaml ckpt | mindir

Notes:

  • Context:Training context denoted as {device}x{pieces}-{MS version}{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
  • The training time of PSENet is highly affected by data processing and varies on different machines.
  • The input_shapes to the exported MindIR models trained on ICDAR2015 are (1,3,1472,2624) for ResNet-152 backbone and (1,3,736,1312) for ResNet-50 or MobileNetV3 backbone.
  • On the SCUT-CTW1500 dataset, the input_shape for exported MindIR in the link is (1,3,1024,1024).

3. Quick Start

3.1 Installation

Please refer to the installation instruction in MindOCR.

3.2 Dataset preparation

3.2.1 ICDAR2015 dataset

Please download ICDAR2015 dataset, and convert the labels to the desired format referring to dataset_converters.

The prepared dataset file struture should be:

.
├── test
│   ├── images
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   └── test_det_gt.txt
└── train
    ├── images
    │   ├── img_1.jpg
    │   ├── img_2.jpg
    │   └── ....jpg
    └── train_det_gt.txt

3.2.2 SCUT-CTW1500 dataset

Please download SCUT-CTW1500 dataset and convert the labels to the desired format referring to dataset_converters.

The prepared dataset file struture should be:

ctw1500
 ├── test_images
 │   ├── 1001.jpg
 │   ├── 1002.jpg
 │   ├── ...
 ├── train_images
 │   ├── 0001.jpg
 │   ├── 0002.jpg
 │   ├── ...
 ├── test_det_gt.txt
 ├── train_det_gt.tx

3.3 Update yaml config file

Update configs/det/psenet/pse_r152_icdar15.yaml configuration file with data paths, specifically the following parts. The dataset_root will be concatenated with data_dir and label_file respectively to be the complete dataset directory and label file path.

...
train:
  ckpt_save_dir: './tmp_det'
  dataset_sink_mode: False
  dataset:
    type: DetDataset
    dataset_root: dir/to/dataset          <--- Update
    data_dir: train/images                <--- Update
    label_file: train/train_det_gt.txt    <--- Update
...
eval:
  dataset_sink_mode: False
  dataset:
    type: DetDataset
    dataset_root: dir/to/dataset          <--- Update
    data_dir: test/images                 <--- Update
    label_file: test/test_det_gt.txt      <--- Update
...

Optionally, change num_workers according to the cores of CPU.

PSENet consists of 3 parts: backbone, neck, and head. Specifically:

model:
  type: det
  transform: null
  backbone:
    name: det_resnet152
    pretrained: True    # Whether to use weights pretrained on ImageNet
  neck:
    name: PSEFPN         # FPN part of the PSENet
    out_channels: 128
  head:
    name: PSEHead
    hidden_size: 256
    out_channels: 7     # number of kernels

3.4 Training

  • Postprocess

Before training, please make sure to compile the postprocessing codes in the /mindocr/postprocess/pse directory as follows:

python3 setup.py build_ext --inplace
  • Standalone training

Please set distribute in yaml config file to be False.

# train psenet on ic15 dataset
python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
  • Distributed training

Please set distribute in yaml config file to be True.

# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml

The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir in yaml config file. The default directory is ./tmp_det.

3.5 Evaluation

To evaluate the accuracy of the trained model, you can use eval.py. Please set the checkpoint path to the arg ckpt_load_path in the eval section of yaml config file, set distribute to be False, and then run:

python tools/eval.py --config configs/det/psenet/pse_r152_icdar15.yaml

3.6 MindSpore Lite Inference

Please refer to the tutorial MindOCR Inference for model inference based on MindSpot Lite on Ascend 310, including the following steps:

  • Model Export

Please download the exported MindIR file first, or refer to the Model Export tutorial and use the following command to export the trained ckpt model to MindIR file:

python tools/export.py --model_name_or_config psenet_resnet152 --data_shape 1472 2624 --local_ckpt_path /path/to/local_ckpt.ckpt
# or
python tools/export.py --model_name_or_config configs/det/psenet/pse_r152_icdar15.yaml --data_shape 1472 2624 --local_ckpt_path /path/to/local_ckpt.ckpt

The data_shape is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in Notes.

  • Environment Installation

Please refer to Environment Installation tutorial to configure the MindSpore Lite inference environment.

  • Model Conversion

Please refer to Model Conversion, and use the converter_lite tool for offline conversion of the MindIR file.

  • Inference

Before inference, please ensure that the post-processing part of PSENet has been compiled (refer to the post-processing part of the Training chapter).

Assuming that you obtain output.mindir after model conversion, go to the deploy/py_infer directory, and use the following command for inference:

python infer.py \
    --input_images_dir=/your_path_to/test_images \
    --det_model_path=your_path_to/output.mindir \
    --det_model_name_or_config=../../configs/det/psenet/pse_r152_icdar15.yaml \
    --res_save_dir=results_dir

References

[1] Wang, Wenhai, et al. "Shape robust text detection with progressive scale expansion network." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.