This software implements the Convolutional Recurrent Neural Network (CRNN) in pytorch. Origin software could be found in crnn
- pytorch 1.3+
- torchvision 0.4+
Prepare a text in the following format
/path/to/img/img.jpg label
...
data link baiduyun code: 9p2m, the dataset is generate by https://github.com/Belval/TextRecognitionDataGenerator
dataset contains 10w images for train and 1w images for test:1w
for all arch ,we stop training after 30 epochs
environment: cuda9.2 torch1.4 torchvision0.5
arch | model size(m) | gpu mem(m) | speed(ms,avg of 100 inference) | acc |
---|---|---|---|---|
CNN_lite_LSTM_CTC | 6.25 | 2731 | 6.91ms | 0.8866 |
VGG(BasicConv)_LSTM_CTC(w320) | 25.45 | 2409 | 4.02ms | 0.9874 |
VGG(BasicConv)_LSTM_CTC(w160) | 25.45 | 2409 | 4.02ms | 0.9908 |
VGG(BasicConv)_LSTM_CTC(w160_no_imagenet_mean_std) | 25.45 | 2409 | 4.02ms | 0.9927 |
VGG(BasicConv)LSTM_CTC(w160.sub(0.5).div_(0.5)) | 25.45 | 2409 | 4.02ms | 0.9927 |
VGG(BasicConv)_LSTM_CTC(w160 origin crnn rnn) | 25.45 | 2409 | 4.02ms | 0.9922 |
VGG(DWconv)_LSTM_CTC(w160_no_imagenet_mean_std) | 25.45 | 2409 | 4.01ms | 0.9725 |
VGG(GhostModule)_LSTM_CTC(w160_no_imagenet_mean_std) | 25.45 | 2329 | 5.46ms | 0.9878 |
ResNet(BasicBlockV2)_LSTM_CTC | 37.21 | 3161 | 5.83ms | 0.9935 |
ResNet(DWBlock_no_se)_LSTM_CTC | 19.22 | 5533 | 12ms | 0.9566 |
ResNet(DWBlock_se)_LSTM_CTC | 19.90 | 5729 | 10ms | 0.9559 |
ResNet(GhostBottleneck_se)_LSTM_CTC | 23.10 | 6291 | 13ms | 0.97 |
- config the
dataset['train']['dataset']['data_path']
,dataset['validate']['dataset']['data_path']
in config.yaml - generate alphabet
use fellow script to generate
alphabet.py
in the some folder withtrain.py
python3 utils/get_keys.py
- use following script to run
python3 train.py --config_path config.yaml
predict.py is used to inference on single image
- config
model_path
,img_path
in predict.py - use following script to predict
python3 predict.py