Inference - Third-party Models

1. Third-Party Model Support List

MindOCR supports the inference of third-party models (PaddleOCR, MMOCR, etc.), and this document displays a list of adapted models. The performance test is based on Ascend310P, and some models have no test data set yet.

1.1 Text Detection

name	model	backbone	dataset	F-score(%)	FPS	source	config	download	reference
ch_pp_server_det_v2.0	DBNet	ResNet18_vd	MLT17	46.22	21.65	PaddleOCR	yaml	weight	ch_ppocr_server_v2.0_det
ch_pp_det_OCRv3	DBNet	MobileNetV3	MLT17	33.89	22.40	PaddleOCR	yaml	weight	ch_PP-OCRv3_det
ch_pp_det_OCRv2	DBNet	MobileNetV3	MLT17	42.99	21.90	PaddleOCR	yaml	weight	ch_PP-OCRv2_det
ch_pp_mobile_det_v2.0_slim	DBNet	MobileNetV3	MLT17	31.66	19.88	PaddleOCR	yaml	weight	ch_ppocr_mobile_slim_v2.0_det
ch_pp_mobile_det_v2.0	DBNet	MobileNetV3	MLT17	31.56	21.96	PaddleOCR	yaml	weight	ch_ppocr_mobile_v2.0_det
en_pp_det_OCRv3	DBNet	MobileNetV3	IC15	42.14	55.55	PaddleOCR	yaml	weight	en_PP-OCRv3_det
ml_pp_det_OCRv3	DBNet	MobileNetV3	MLT17	66.01	22.48	PaddleOCR	yaml	weight	ml_PP-OCRv3_det
en_pp_det_dbnet_resnet50vd	DBNet	ResNet50_vd	IC15	79.89	21.17	PaddleOCR	yaml	weight	DBNet
en_pp_det_psenet_resnet50vd	PSE	ResNet50_vd	IC15	80.44	7.75	PaddleOCR	yaml	weight	PSE
en_pp_det_east_resnet50vd	EAST	ResNet50_vd	IC15	85.58	20.70	PaddleOCR	yaml	weight	EAST
en_pp_det_sast_resnet50vd	SAST	ResNet50_vd	IC15	81.77	22.14	PaddleOCR	yaml	weight	SAST
en_mm_det_dbnetpp_resnet50	DBNet++	ResNet50	IC15	81.36	10.66	MMOCR	yaml	weight	DBNetpp
en_mm_det_fcenet_resnet50	FCENet	ResNet50	IC15	83.67	3.34	MMOCR	yaml	weight	FCENet

Notice: When using the en_pp_det_psenet_resnet50vd model for inference, you need to modify the onnx file with the following command

python deploy/models_utils/onnx_optim/insert_pse_postprocess.py \
      --model_path=./pse_r50vd.onnx \
      --binary_thresh=0.0 \
      --scale=1.0

1.2 Text recognition

name	model	backbone	dataset	Acc(%)	FPS	source	dict file	config	download	reference
ch_pp_server_rec_v2.0	CRNN	ResNet34	MLT17 (ch)	49.91	154.16	PaddleOCR	ppocr_keys_v1.txt	yaml	weight	ch_ppocr_server_v2.0_rec
ch_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	MLT17 (ch)	49.91	408.38	PaddleOCR	ppocr_keys_v1.txt	yaml	weight	ch_PP-OCRv3_rec
ch_pp_rec_OCRv2	CRNN	MobileNetV1Enhance	MLT17 (ch)	44.59	203.34	PaddleOCR	ppocr_keys_v1.txt	yaml	weight	ch_PP-OCRv2_rec
ch_pp_mobile_rec_v2.0	CRNN	MobileNetV3	MLT17 (ch)	24.59	167.67	PaddleOCR	ppocr_keys_v1.txt	yaml	weight	ch_ppocr_mobile_v2.0_rec
en_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	MLT17 (en)	79.79	917.01	PaddleOCR	en_dict.txt	yaml	weight	en_PP-OCRv3_rec
en_pp_mobile_rec_number_v2.0_slim	CRNN	MobileNetV3	/	/	/	PaddleOCR	en_dict.txt	yaml	weight	en_number_mobile_slim_v2.0_rec
en_pp_mobile_rec_number_v2.0	CRNN	MobileNetV3	/	/	/	PaddleOCR	en_dict.txt	yaml	weight	en_number_mobile_v2.0_rec
korean_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	korean_dict.txt	yaml	weight	korean_PP-OCRv3_rec
japan_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	japan_dict.txt	yaml	weight	japan_PP-OCRv3_rec
chinese_cht_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	chinese_cht_dict.txt	yaml	weight	chinese_cht_PP-OCRv3_rec
te_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	te_dict.txt	yaml	weight	te_PP-OCRv3_rec
ka_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	ka_dict.txt	yaml	weight	ka_PP-OCRv3_rec
ta_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	ta_dict.txt	yaml	weight	ta_PP-OCRv3_rec
latin_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	latin_dict.txt	yaml	weight	latin_PP-OCRv3_rec
arabic_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	arabic_dict.txt	yaml	weight	arabic_PP-OCRv3_rec
cyrillic_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	cyrillic_dict.txt	yaml	weight	cyrillic_PP-OCRv3_rec
devanagari_pp_rec_OCRv3	SVTR	MobileNetV1Enhance	/	/	/	PaddleOCR	devanagari_dict.txt	yaml	weight	devanagari_PP-OCRv3_rec
en_pp_rec_crnn_resnet34vd	CRNN	ResNet34_vd	IC15	66.35	420.80	PaddleOCR	ic15_dict.txt	yaml	weight	CRNN
en_pp_rec_rosetta_resnet34vd	Rosetta	Resnet34_vd	IC15	64.28	552.40	PaddleOCR	ic15_dict.txt	yaml	weight	Rosetta
en_pp_rec_vitstr_vitstr	ViTSTR	ViTSTR	IC15	68.42	364.67	PaddleOCR	EN_symbol_dict.txt	yaml	weight	ViTSTR
en_mm_rec_nrtr_resnet31	NRTR	ResNet31	IC15	67.26	32.63	MMOCR	english_digits_symbols.txt	yaml	weight	NRTR
en_mm_rec_satrn_shallowcnn	SATRN	ShallowCNN	IC15	73.52	32.14	MMOCR	english_digits_symbols.txt	yaml	weight	SATRN

1.3 Text angle classification

name	model	dataset	Acc(%)	FPS	source	config	download	reference
ch_pp_mobile_cls_v2.0	MobileNetV3	/	/	/	PaddleOCR	yaml	weight	ch_ppocr_mobile_v2.0_cls

2. Overview of Third-Party Inference

graph LR;
    A[ThirdParty models] -- xx2onnx --> B[ONNX] -- converter_lite --> C[MindIR];
    C --input --> D[infer.py] -- outputs --> eval_rec.py/eval_det.py;
    H[images] --input --> D[infer.py];

Loading

3. Third-Party Model Inference Methods

3.1 Text Detection

Let's take en_pp_det_dbnet_resnet50vd in Third-Party Model Support List as an example to introduce the inference method:

Download the weight in the Third-Party Model Support List and decompress it;
Since this model is a paddle training model, it needs to be converted to an inference model first (skip this step if it is already an inference model):

git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd Paddle OCR
python tools/export_model.py \
-c configs/det/det_r50_vd_db.yml \
-o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy\
Global.save_inference_dir=./det_db

After execution, the following will be generated：

det_db/
├── inference.pdmodel
├── inference.pdiparams
├── inference.pdiparams.info

Download and use the paddle2onnx tool (pip install paddle2onnx) to convert the inference model into an onnx file:

paddle2onnx \
     --model_dir det_db \
     --model_filename inference.pdmodel \
     --params_filename inference.pdiparams\
     --save_file det_db.onnx \
     --opset_version 11 \
     --input_shape_dict="{'x':[-1,3,-1,-1]}" \
     --enable_onnx_checker True

A brief explanation of parameters for paddle2onnx is as follows:

Parameter	Description
--model_dir	Configures the directory path containing the Paddle model.
--model_filename	[Optional] Configures the file name storing the network structure located under `--model_dir`.
--params_filename	[Optional] Configures the file name storing the model parameters located under `--model_dir`.
--save_file	Specifies the directory path for saving the converted model.
--opset_version	[Optional] Configures the OpSet version for converting to ONNX. Multiple versions, such as 7~16, are currently supported, and the default is 9.
--input_shape_dict	Specifies the shape of the input tensor for generating a dynamic ONNX model. The format is "{'x': [N, C, H, W]}", where -1 represents dynamic shape.
--enable_onnx_checker	[Optional] Configures whether to check the correctness of the exported ONNX model. It is recommended to enable this switch, and the default is False.

The value of --input_shape_dict in the parameter can be viewed by opening the inference model through the Netron tool.

Learn more about paddle2onnx

The det_db.onnx file will be generated after the above command is executed;

Use converter_lite tool on Ascend310/310P to convert onnx files to mindir:

Create config.txt and specify the model input shape. An example is as follows:

[ascend_context]
input_format=NCHW
input_shape=x:[1,3,736,1280]

A brief explanation of the configuration file parameters is as follows:

Parameter	Attribute	Function Description	Data Type	Value Description
input_format	Optional	Specify the format of the model input	String	Optional values are "NCHW", "NHWC", "ND"
input_shape	Optional	Specify the shape of the model input. The input_name must be the input name in the original model, arranged in order of input, separated by ";"	String	For example: "input1:[1,64,64,3];input2:[1,256,256,3]"
dynamic_dims	Optional	Specify dynamic BatchSize and dynamic resolution parameters	String	For example: "dynamic_dims=[48,520],[48,320],[48,384]"

Learn more about Configuration File Parameters

Run the following command:

converter_lite\
     --saveType=MINDIR \
     --fmk=ONNX \
     --optimize=ascend_oriented \
     --modelFile=det_db.onnx \
     --outputFile=det_db_output \
     --configFile=config.txt

After the above command is executed, the det_db_output.mindir file will be generated;

A brief explanation of the converter_lite parameters is as follows:

Parameter	Required	Parameter Description	Value Range	Default	Remarks
fmk	Yes	Input model format	MINDIR, CAFFE, TFLITE, TF, ONNX	-	-
saveType	No	Set the exported model to MINDIR or MS model format.	MINDIR, MINDIR_LITE	MINDIR	The cloud-side inference version can only infer models converted to MINDIR format
modelFile	Yes	Input model path	-	-	-
outputFile	Yes	Output model path. Do not add a suffix, ".mindir" suffix will be generated automatically.	-	-	-
configFile	No	1) Path to the quantization configuration file after training; 2) Path to the configuration file for extended functions	-	-	-
optimize	No	Set the model optimization type for the device. Default is none.	none、general、gpu_oriented、ascend_oriented	-	-

Learn more about converter_lite

Learn more about Model Conversion Tutorial

Perform inference using /deploy/py_infer/infer.py codes and det_db_output.mindir model file:

python infer.py\
     --input_images_dir=/path/to/ic15/ch4_test_images \
     --det_model_path=/path/to/mindir/det_db_output.mindir\
     --det_model_name_or_config=en_pp_det_dbnet_resnet50vd \
     --res_save_dir=/path/to/dbnet_resnet50vd_results

After the execution is completed, the prediction file det_results.txt will be generated in the directory pointed to by the parameter --res_save_dir.

When doing inference, you can use the --vis_det_save_dir parameter to visualize the results:

Visualization of text detection results

Learn more about infer.py inference parameters

Evaluate the results using the following command:

python deploy/eval_utils/eval_det.py\
--gt_path=/path/to/ic15/test_det_gt.txt\
--pred_path=/path/to/dbnet_resnet50vd_results/det_results.txt

The result is: {'recall': 0.8281174771304767, 'precision': 0.7716464782413638, 'f-score': 0.7988852763585693}

3.2 Text Recognition

Let's take en_pp_rec_OCRv3 in Third-Party Model Support List as an example to introduce the inference method:

Download the weight file weight in the Third-Party Model Support List and decompress it;
Since the model is a paddle inference model, directly proceed to the third step of paddle conversion to onnx (otherwise, the training model needs to be converted into an inference model, refer to the above text detection);
Download and use the paddle2onnx tool (pip install paddle2onnx) to convert the inference model into an onnx file:

paddle2onnx \
     --model_dir en_PP-OCRv3_rec_infer \
     --model_filename inference.pdmodel \
     --params_filename inference.pdiparams\
     --save_file en_PP-OCRv3_rec_infer.onnx \
     --opset_version 11 \
     --input_shape_dict="{'x':[-1,3,48,-1]}" \
     --enable_onnx_checker True

For a brief description of paddle2onnx parameters, see the text detection example above.

The value of --input_shape_dict in the parameter can be viewed by opening the inference model through the Netron tool.

Learn more about paddle2onnx

The en_PP-OCRv3_rec_infer.onnx file will be generated after the above command is executed;

Use converter_lite tool on Ascend310/310P to convert onnx files to mindir:

Create config.txt and specify the model input shape. An example is as follows:

[ascend_context]
input_format=NCHW
input_shape=x:[1,3,-1,-1]
dynamic_dims=[48,520],[48,320],[48,384],[48,360],[48,394],[48,321],[48,336],[48,368],[48,328],[48,685],[48,347]

For a brief description of the configuration parameters, see the text detection example above.

Learn more about Configuration File Parameters

Run the following command:

converter_lite\
     --saveType=MINDIR \
     --fmk=ONNX \
     --optimize=ascend_oriented\
     --modelFile=en_PP-OCRv3_rec_infer.onnx \
     --outputFile=en_PP-OCRv3_rec_infer \
     --configFile=config.txt

After the above command is executed, the en_PP-OCRv3_rec_infer.mindir file will be generated;

For a brief description of the converter_lite parameters, see the text detection example above.

Learn more about converter_lite

Learn more about Model Conversion Tutorial

Download the dictionary file en_dict.txt corresponding to the model, use /deploy/py_infer/infer.py code and en_PP-OCRv3_rec_infer.mindir file to perform inferencing:

python infer.py\
     --input_images_dir=/path/to/mlt17_en \
     --device_id=1 \
     --parallel_num=2 \
     --rec_model_path=/path/to/mindir/en_PP-OCRv3_rec_infer.mindir\
     --rec_model_name_or_config=en_pp_rec_OCRv3 \
     --character_dict_path=/path/to/en_dict.txt\
     --res_save_dir=/path/to/en_rec_infer_results\
     --save_log_dir=/path/to/en_rec_infer_logs

After the execution is completed, the prediction file rec_results.txt will be generated in the directory pointed to by the parameter --res_save_dir.

Learn more about infer.py inference parameters

Evaluate the results using the following command:

python deploy/eval_utils/eval_rec.py \
--gt_path=/path/to/mlt17_en/english_gt.txt \
--pred_path=/path/to/en_rec_infer_results/rec_results.txt

The result is: {'acc': 0.7979344129562378, 'norm_edit_distance': 0.8859519958496094}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference_thirdparty_quickstart.md

inference_thirdparty_quickstart.md

Inference - Third-party Models

1. Third-Party Model Support List

1.1 Text Detection

1.2 Text recognition

1.3 Text angle classification

2. Overview of Third-Party Inference

3. Third-Party Model Inference Methods

3.1 Text Detection

3.2 Text Recognition

Files

inference_thirdparty_quickstart.md

Latest commit

History

inference_thirdparty_quickstart.md

File metadata and controls

Inference - Third-party Models

1. Third-Party Model Support List

1.1 Text Detection

1.2 Text recognition

1.3 Text angle classification

2. Overview of Third-Party Inference

3. Third-Party Model Inference Methods

3.1 Text Detection

3.2 Text Recognition