Skip to content

Latest commit

 

History

History
285 lines (233 loc) · 37 KB

inference_thirdparty_quickstart.md

File metadata and controls

285 lines (233 loc) · 37 KB

Inference - Third-party Models

1. Third-Party Model Support List

MindOCR supports the inference of third-party models (PaddleOCR, MMOCR, etc.), and this document displays a list of adapted models. The performance test is based on Ascend310P, and some models have no test data set yet.

1.1 Text Detection

name model backbone dataset F-score(%) FPS source config download reference
ch_pp_server_det_v2.0 DBNet ResNet18_vd MLT17 46.22 21.65 PaddleOCR yaml weight ch_ppocr_server_v2.0_det
ch_pp_det_OCRv3 DBNet MobileNetV3 MLT17 33.89 22.40 PaddleOCR yaml weight ch_PP-OCRv3_det
ch_pp_det_OCRv2 DBNet MobileNetV3 MLT17 42.99 21.90 PaddleOCR yaml weight ch_PP-OCRv2_det
ch_pp_mobile_det_v2.0_slim DBNet MobileNetV3 MLT17 31.66 19.88 PaddleOCR yaml weight ch_ppocr_mobile_slim_v2.0_det
ch_pp_mobile_det_v2.0 DBNet MobileNetV3 MLT17 31.56 21.96 PaddleOCR yaml weight ch_ppocr_mobile_v2.0_det
en_pp_det_OCRv3 DBNet MobileNetV3 IC15 42.14 55.55 PaddleOCR yaml weight en_PP-OCRv3_det
ml_pp_det_OCRv3 DBNet MobileNetV3 MLT17 66.01 22.48 PaddleOCR yaml weight ml_PP-OCRv3_det
en_pp_det_dbnet_resnet50vd DBNet ResNet50_vd IC15 79.89 21.17 PaddleOCR yaml weight DBNet
en_pp_det_psenet_resnet50vd PSE ResNet50_vd IC15 80.44 7.75 PaddleOCR yaml weight PSE
en_pp_det_east_resnet50vd EAST ResNet50_vd IC15 85.58 20.70 PaddleOCR yaml weight EAST
en_pp_det_sast_resnet50vd SAST ResNet50_vd IC15 81.77 22.14 PaddleOCR yaml weight SAST
en_mm_det_dbnetpp_resnet50 DBNet++ ResNet50 IC15 81.36 10.66 MMOCR yaml weight DBNetpp
en_mm_det_fcenet_resnet50 FCENet ResNet50 IC15 83.67 3.34 MMOCR yaml weight FCENet

Notice: When using the en_pp_det_psenet_resnet50vd model for inference, you need to modify the onnx file with the following command

python deploy/models_utils/onnx_optim/insert_pse_postprocess.py \
      --model_path=./pse_r50vd.onnx \
      --binary_thresh=0.0 \
      --scale=1.0

1.2 Text recognition

name model backbone dataset Acc(%) FPS source dict file config download reference
ch_pp_server_rec_v2.0 CRNN ResNet34 MLT17 (ch) 49.91 154.16 PaddleOCR ppocr_keys_v1.txt yaml weight ch_ppocr_server_v2.0_rec
ch_pp_rec_OCRv3 SVTR MobileNetV1Enhance MLT17 (ch) 49.91 408.38 PaddleOCR ppocr_keys_v1.txt yaml weight ch_PP-OCRv3_rec
ch_pp_rec_OCRv2 CRNN MobileNetV1Enhance MLT17 (ch) 44.59 203.34 PaddleOCR ppocr_keys_v1.txt yaml weight ch_PP-OCRv2_rec
ch_pp_mobile_rec_v2.0 CRNN MobileNetV3 MLT17 (ch) 24.59 167.67 PaddleOCR ppocr_keys_v1.txt yaml weight ch_ppocr_mobile_v2.0_rec
en_pp_rec_OCRv3 SVTR MobileNetV1Enhance MLT17 (en) 79.79 917.01 PaddleOCR en_dict.txt yaml weight en_PP-OCRv3_rec
en_pp_mobile_rec_number_v2.0_slim CRNN MobileNetV3 / / / PaddleOCR en_dict.txt yaml weight en_number_mobile_slim_v2.0_rec
en_pp_mobile_rec_number_v2.0 CRNN MobileNetV3 / / / PaddleOCR en_dict.txt yaml weight en_number_mobile_v2.0_rec
korean_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR korean_dict.txt yaml weight korean_PP-OCRv3_rec
japan_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR japan_dict.txt yaml weight japan_PP-OCRv3_rec
chinese_cht_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR chinese_cht_dict.txt yaml weight chinese_cht_PP-OCRv3_rec
te_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR te_dict.txt yaml weight te_PP-OCRv3_rec
ka_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR ka_dict.txt yaml weight ka_PP-OCRv3_rec
ta_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR ta_dict.txt yaml weight ta_PP-OCRv3_rec
latin_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR latin_dict.txt yaml weight latin_PP-OCRv3_rec
arabic_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR arabic_dict.txt yaml weight arabic_PP-OCRv3_rec
cyrillic_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR cyrillic_dict.txt yaml weight cyrillic_PP-OCRv3_rec
devanagari_pp_rec_OCRv3 SVTR MobileNetV1Enhance / / / PaddleOCR devanagari_dict.txt yaml weight devanagari_PP-OCRv3_rec
en_pp_rec_crnn_resnet34vd CRNN ResNet34_vd IC15 66.35 420.80 PaddleOCR ic15_dict.txt yaml weight CRNN
en_pp_rec_rosetta_resnet34vd Rosetta Resnet34_vd IC15 64.28 552.40 PaddleOCR ic15_dict.txt yaml weight Rosetta
en_pp_rec_vitstr_vitstr ViTSTR ViTSTR IC15 68.42 364.67 PaddleOCR EN_symbol_dict.txt yaml weight ViTSTR
en_mm_rec_nrtr_resnet31 NRTR ResNet31 IC15 67.26 32.63 MMOCR english_digits_symbols.txt yaml weight NRTR
en_mm_rec_satrn_shallowcnn SATRN ShallowCNN IC15 73.52 32.14 MMOCR english_digits_symbols.txt yaml weight SATRN

1.3 Text angle classification

name model dataset Acc(%) FPS source config download reference
ch_pp_mobile_cls_v2.0 MobileNetV3 / / / PaddleOCR yaml weight ch_ppocr_mobile_v2.0_cls

2. Overview of Third-Party Inference

graph LR;
    A[ThirdParty models] -- xx2onnx --> B[ONNX] -- converter_lite --> C[MindIR];
    C --input --> D[infer.py] -- outputs --> eval_rec.py/eval_det.py;
    H[images] --input --> D[infer.py];
Loading

3. Third-Party Model Inference Methods

3.1 Text Detection

Let's take en_pp_det_dbnet_resnet50vd in Third-Party Model Support List as an example to introduce the inference method:

  • Download the weight in the Third-Party Model Support List and decompress it;

  • Since this model is a paddle training model, it needs to be converted to an inference model first (skip this step if it is already an inference model):

git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd Paddle OCR
python tools/export_model.py \
-c configs/det/det_r50_vd_db.yml \
-o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy\
Global.save_inference_dir=./det_db

After execution, the following will be generated:

det_db/
├── inference.pdmodel
├── inference.pdiparams
├── inference.pdiparams.info
  • Download and use the paddle2onnx tool (pip install paddle2onnx) to convert the inference model into an onnx file:
paddle2onnx \
     --model_dir det_db \
     --model_filename inference.pdmodel \
     --params_filename inference.pdiparams\
     --save_file det_db.onnx \
     --opset_version 11 \
     --input_shape_dict="{'x':[-1,3,-1,-1]}" \
     --enable_onnx_checker True

A brief explanation of parameters for paddle2onnx is as follows:

Parameter Description
--model_dir Configures the directory path containing the Paddle model.
--model_filename [Optional] Configures the file name storing the network structure located under --model_dir.
--params_filename [Optional] Configures the file name storing the model parameters located under --model_dir.
--save_file Specifies the directory path for saving the converted model.
--opset_version [Optional] Configures the OpSet version for converting to ONNX. Multiple versions, such as 7~16, are currently supported, and the default is 9.
--input_shape_dict Specifies the shape of the input tensor for generating a dynamic ONNX model. The format is "{'x': [N, C, H, W]}", where -1 represents dynamic shape.
--enable_onnx_checker [Optional] Configures whether to check the correctness of the exported ONNX model. It is recommended to enable this switch, and the default is False.

The value of --input_shape_dict in the parameter can be viewed by opening the inference model through the Netron tool.

Learn more about paddle2onnx

The det_db.onnx file will be generated after the above command is executed;

  • Use converter_lite tool on Ascend310/310P to convert onnx files to mindir:

Create config.txt and specify the model input shape. An example is as follows:

[ascend_context]
input_format=NCHW
input_shape=x:[1,3,736,1280]

A brief explanation of the configuration file parameters is as follows:

Parameter Attribute Function Description Data Type Value Description
input_format Optional Specify the format of the model input String Optional values are "NCHW", "NHWC", "ND"
input_shape Optional Specify the shape of the model input. The input_name must be the input name in the original model, arranged in order of input, separated by ";" String For example: "input1:[1,64,64,3];input2:[1,256,256,3]"
dynamic_dims Optional Specify dynamic BatchSize and dynamic resolution parameters String For example: "dynamic_dims=[48,520],[48,320],[48,384]"

Learn more about Configuration File Parameters

Run the following command:

converter_lite\
     --saveType=MINDIR \
     --fmk=ONNX \
     --optimize=ascend_oriented \
     --modelFile=det_db.onnx \
     --outputFile=det_db_output \
     --configFile=config.txt

After the above command is executed, the det_db_output.mindir file will be generated;

A brief explanation of the converter_lite parameters is as follows:

Parameter Required Parameter Description Value Range Default Remarks
fmk Yes Input model format MINDIR, CAFFE, TFLITE, TF, ONNX - -
saveType No Set the exported model to MINDIR or MS model format. MINDIR, MINDIR_LITE MINDIR The cloud-side inference version can only infer models converted to MINDIR format
modelFile Yes Input model path - - -
outputFile Yes Output model path. Do not add a suffix, ".mindir" suffix will be generated automatically. - - -
configFile No 1) Path to the quantization configuration file after training; 2) Path to the configuration file for extended functions - - -
optimize No Set the model optimization type for the device. Default is none. none、general、gpu_oriented、ascend_oriented - -

Learn more about converter_lite

Learn more about Model Conversion Tutorial

  • Perform inference using /deploy/py_infer/infer.py codes and det_db_output.mindir model file:
python infer.py\
     --input_images_dir=/path/to/ic15/ch4_test_images \
     --det_model_path=/path/to/mindir/det_db_output.mindir\
     --det_model_name_or_config=en_pp_det_dbnet_resnet50vd \
     --res_save_dir=/path/to/dbnet_resnet50vd_results

After the execution is completed, the prediction file det_results.txt will be generated in the directory pointed to by the parameter --res_save_dir.

When doing inference, you can use the --vis_det_save_dir parameter to visualize the results:

Visualization of text detection results

Learn more about infer.py inference parameters

  • Evaluate the results using the following command:
python deploy/eval_utils/eval_det.py\
--gt_path=/path/to/ic15/test_det_gt.txt\
--pred_path=/path/to/dbnet_resnet50vd_results/det_results.txt

The result is: {'recall': 0.8281174771304767, 'precision': 0.7716464782413638, 'f-score': 0.7988852763585693}

3.2 Text Recognition

Let's take en_pp_rec_OCRv3 in Third-Party Model Support List as an example to introduce the inference method:

  • Download the weight file weight in the Third-Party Model Support List and decompress it;

  • Since the model is a paddle inference model, directly proceed to the third step of paddle conversion to onnx (otherwise, the training model needs to be converted into an inference model, refer to the above text detection);

  • Download and use the paddle2onnx tool (pip install paddle2onnx) to convert the inference model into an onnx file:

paddle2onnx \
     --model_dir en_PP-OCRv3_rec_infer \
     --model_filename inference.pdmodel \
     --params_filename inference.pdiparams\
     --save_file en_PP-OCRv3_rec_infer.onnx \
     --opset_version 11 \
     --input_shape_dict="{'x':[-1,3,48,-1]}" \
     --enable_onnx_checker True

For a brief description of paddle2onnx parameters, see the text detection example above.

The value of --input_shape_dict in the parameter can be viewed by opening the inference model through the Netron tool.

Learn more about paddle2onnx

The en_PP-OCRv3_rec_infer.onnx file will be generated after the above command is executed;

  • Use converter_lite tool on Ascend310/310P to convert onnx files to mindir:

Create config.txt and specify the model input shape. An example is as follows:

[ascend_context]
input_format=NCHW
input_shape=x:[1,3,-1,-1]
dynamic_dims=[48,520],[48,320],[48,384],[48,360],[48,394],[48,321],[48,336],[48,368],[48,328],[48,685],[48,347]

For a brief description of the configuration parameters, see the text detection example above.

Learn more about Configuration File Parameters

Run the following command:

converter_lite\
     --saveType=MINDIR \
     --fmk=ONNX \
     --optimize=ascend_oriented\
     --modelFile=en_PP-OCRv3_rec_infer.onnx \
     --outputFile=en_PP-OCRv3_rec_infer \
     --configFile=config.txt

After the above command is executed, the en_PP-OCRv3_rec_infer.mindir file will be generated;

For a brief description of the converter_lite parameters, see the text detection example above.

Learn more about converter_lite

Learn more about Model Conversion Tutorial

  • Download the dictionary file en_dict.txt corresponding to the model, use /deploy/py_infer/infer.py code and en_PP-OCRv3_rec_infer.mindir file to perform inferencing:
python infer.py\
     --input_images_dir=/path/to/mlt17_en \
     --device_id=1 \
     --parallel_num=2 \
     --rec_model_path=/path/to/mindir/en_PP-OCRv3_rec_infer.mindir\
     --rec_model_name_or_config=en_pp_rec_OCRv3 \
     --character_dict_path=/path/to/en_dict.txt\
     --res_save_dir=/path/to/en_rec_infer_results\
     --save_log_dir=/path/to/en_rec_infer_logs

After the execution is completed, the prediction file rec_results.txt will be generated in the directory pointed to by the parameter --res_save_dir.

Learn more about infer.py inference parameters

  • Evaluate the results using the following command:
python deploy/eval_utils/eval_rec.py \
--gt_path=/path/to/mlt17_en/english_gt.txt \
--pred_path=/path/to/en_rec_infer_results/rec_results.txt

The result is: {'acc': 0.7979344129562378, 'norm_edit_distance': 0.8859519958496094}