I. Regular season: Chinese scene text recognition

Address: aistudio.baidu.com/aistudio/co…

1. Introduction to the competition

Chinese scene character recognition technology has been widely concerned in People’s Daily life, with a variety of application scenarios, such as photo translation, image retrieval, scene understanding, etc. However, characters in Chinese scenes are faced with complex situations, such as lighting changes, low resolution, font and layout diversity, and various types of Chinese characters. How to solve the above problems has become a challenging task.

A new upgrade to the regular season of Chinese scene character recognition, which provides lightweight Chinese scene character recognition data, requires players to use the flying paddle frame to predict the text lines in the image area and return the content of the text lines.

2. Data set description

The data set of this competition includes a total of 60,000 images, including 50,000 images as a training set and 10,000 as a test set. The dataset is taken from Chinese street view and is formed by intercepting text line areas (e.g. shop signs, landmarks, etc.) from street view images.

Specific data Introduction

All images in the dataset underwent some pre-processing, as shown in the figure below:

(a) Label: Joss Billiards Club

(b) Label: Shanghai Chuangke Pump Manufacturing Co., LTD

Annotations file

The annotation file provided by the platform is a. CSV file format, in which the four columns are the width, height, file name and text annotation of the image respectively. Here’s an example:

name value
0.jpg Text 0
——– ——–
1.jpg Text 0

Two, environment Settings

PaddleOCR github.com/paddlepaddl… It is the most powerful OCR tool library in the universe, out of the box, fast.

# Download the PaddleOCR code from Gitee or from the GitHub link! git clone https://gitee.com/paddlepaddle/PaddleOCR.git --depth=1
# upgrade PIP! pip install -U pip# install dependencies
%cd ~/PaddleOCR
%pip install -r requirements.txt
Copy the code
%cd ~/PaddleOCR/ ! tree -L1
Copy the code
/home/ AiStudio /PaddleOCR. ├── Heavy ├── Heavy Flag ├── Heavy Flag flag ── Heavy Flag Flag ── Heavy Flag Flag ── Heavy Flag Flag ── Heavy Flag Flag ── Heavy Flag Flag ── Heavy Flag Flag ── Heavy Flag Flag ── Py ├─ Ppocr ├─ PPOCRLabel ├─ PPStructure ├─ README_ch. Md ├─ Requirements. TXT ├─ setup ├── StyleText ├─ tools ├─ 12 directories, 9 filesCopy the code

3. Data preparation

It is reported that the train data set has a total of 100,000 pieces, which were decompressed and divided into 10,000 pieces as test sets.

1. Decompress the data download

Unzip the dataset%cd ~ ! unzip -qa data/data62842/train_images.zip-d data/data62842/ ! unzip -qa data/data62843/test_images.zip -d data/data62843/
Copy the code
/home/aistudio
Copy the code
# Use the command to check whether the amount of data in the training data folder is 50,000! cd ~/data/data62842/train_images && ls -l | grep"^ -" | wc -l
Copy the code
50000
Copy the code
Check whether the number of files in the test data folder is 10,000! cd ~/data/data62843/test_images && ls -l | grep"^ -" | wc -l
Copy the code
10000
Copy the code

2. Data set division

Read the data list file
import pandas as pd
%cd ~
data_label=pd.read_csv('data/data62842/train_label.csv', encoding='gb2312')
data_label.head()
Copy the code
/home/aistudio
Copy the code
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

name value
0 0.jpg lala
1 1.jpg No. 6
2 2.jpg fat
3 3.jpg Qianmen Dashilan General store
4 4.jpg Your arrival is the peak season
Divide the data list file
%cd ~/data/data62842/
print(data_label.shape)
train=data_label[:45000]
val=data_label[45000:]
train.to_csv('train.txt',sep='\t',header=None,index=None)
val.to_csv('val.txt',sep='\t',header=None,index=None)
Copy the code
/home/aistudio/data/data62842
(50000, 2)
Copy the code
# count count
print(train.shape)
print(val.shape)
Copy the code
(45000, 2)
(5000, 2)
Copy the code
! head val.txtCopy the code
Responsible Unit: Beijing huanqing sanitation facilities maintenance 45001.jpg glasses 45002.jpg presence 45003.jpg attending 45004.jpg rice bone soup 45005.jpg management 45006.jpg more people to book in advance 45007.jpg dry cleaning wet washing Electric welding, gas cutting, professional dump truckCopy the code
! head train.txtCopy the code
0. JPG lala 1. JPG 62. JPG fat 3. JPG Qianmen Dashilan General Store 4. JPG you come is the peak season 5. JPG sweater manufacturer direct sales 6Copy the code

4. Configure training parameters

Use PaddleOCR/configs/rec/ ch_pPOcr_v2.0 / rec_chinese_lite_train_V2.0.yml as the reference

1. Configure the model network

Using CRNN algorithm, backbone is MobileNetV3, loss function is CTCLoss

Architecture:
  model_type: rec
  algorithm: CRNN
  Transform:
  Backbone:
    name: MobileNetV3
    scale: 0.5
    model_name: small
    small_stride: [1, 2, 2, 2]
  Neck:
    name: SequenceEncoder
    encoder_type: rnn
    hidden_size: 48
  Head:
    name: CTCHead
    fc_decay: 0.00001
Copy the code

2. Configure data

Train.data_dir, train. label_file_list, eval. data_dir, eval. label_file_list are configured

Train: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/data62842/train_images label_file_list: ["/home/aistudio/data/data62842/train.txt"] ... . Eval: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/data62842/train_images label_file_list: ["/home/aistudio/data/data62842/val.txt"]Copy the code

3. Graphics card and evaluation Settings

Use_gpu and cal_metric_during_train are GPU and evaluation switches respectively

Global: use_GPU: false # true Use GPU..... Cal_metric_during_train: False # true Turn on evaluationCopy the code

4. Multi-threaded tasks

Train.loader.num_workers:4
Eval.loader.num_workers: 4
Copy the code

5. Complete the configuration

Global: use_gpu: true epoch_num: 500 log_smooth_window: 20 print_batch_step: 10 save_model_dir: . / output/rec_chinese_lite_v2. 0 save_epoch_step: 3 # evaluation is run every 5000 iterations after the 4000th iteration eval_batch_step: [0, 2000] Cal_metric_during_train: changing the CPU CPU CPU CPU CPU CPU save_inference_dir: use_visualdl: True infer_img: doc/imgs_words/ch/word_1.jpg # for data or label process character_dict_path: ppocr/utils/ppocr_keys_v1.txt max_text_length: 25 infer_mode: False use_space_char: True save_res_path: ./output/rec/predicts_chinese_lite_v2.0.txt Optimizer: name: Adam beta1:0.9 beta2:0.999 Cosup_epoch: 5 Regularizer: name: 'L2' Factor: 0.00001 Architecture: Model_type: Rec algorithm: CRNN Transform: Backbone: name: MobileNetV3 Scale: 0.5 Model_name: Small Small_stride: [1, 2, 2, 2] Neck: name: SequenceEncoder encoder_type: RNN hidden_size: 48 Head: name: CTCHead fc_decay: 0.00001 Loss: name: CTCLoss PostProcess: name: CTCLabelDecode Metric: name: RecMetric main_indicator: acc Train: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/data62842/train_images label_file_list: ["/home/aistudio/data/data62842/train.txt"] transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - RecAug: - CTCLabelEncode: # Class handling label - RecResizeImg: image_shape: [3, 32, 320] - KeepKeys: keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order loader: shuffle: True batch_size_per_card: 256 drop_last: True num_workers: 8 Eval: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/data62842/train_images label_file_list: ["/home/aistudio/data/data62842/val.txt"] transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - CTCLabelEncode: # Class handling label - RecResizeImg: image_shape: [3, 32, 320] - KeepKeys: keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order loader: shuffle: False drop_last: False batch_size_per_card: 256 num_workers: 8Copy the code

6. Use a pre-training model

It is reported that using pre-training model, training speed is faster!!

The downloadable models provided by PaddleOCR include inference model, training model, pre-training model and SLIM model. The differences of models are explained as follows:

Model type Model format Introduction to the
Inference model Inference. Pdmodel, inference. Pdiparams For predictive engine reasoning,details
Training model, pre-training model *.pdparams, *.pdopt, *.states The model parameters, optimizer state and training intermediate information saved during training are mainly used for model index evaluation and recovery training
Slim model *.nb The model compressed by PaddleSlim is suitable for end-to-end deployment scenarios such as mobile end /IoT end (Paddle Lite deployment is required).

The relationship between the models is shown in the diagram below.

Text detection model

The model name Model description The configuration file Inference model size Download address
_det ch_ppocr_mobile_slim_v2. 0 Slim slim ultra lightweight model, support Chinese and English, multilingual text detection Ch_det_mv3_db_v2. 0. Yml 2.6 M Inference model
_det ch_ppocr_mobile_v2. 0 Original ultra-lightweight model, support English and multilingual text detection Ch_det_mv3_db_v2. 0. Yml 3M Inference model / Training model
_det ch_ppocr_server_v2. 0 The universal model supports text detection in Both Chinese and English and multiple languages. It is larger than the ultra-lightweight model, but the effect is better Ch_det_res18_db_v2. 0. Yml 47M Inference model / Training model

Text recognition model

Chinese recognition model
The model name Model description The configuration file Inference model size Download address
Ch_ppocr_mobile_slim_v2. 0 _rec Slim trimmed quantization ultra lightweight model, support Both Chinese and English, digital recognition Rec_chinese_lite_train_v2. 0. Yml 6M Inference model / Training model
Ch_ppocr_mobile_v2. 0 _rec Original ultra-light model, support English, digital recognition Rec_chinese_lite_train_v2. 0. Yml 5.2 M Inference model / Training model / Pretraining model
Ch_ppocr_server_v2. 0 _rec General model, support Both Chinese and English, numeric recognition Rec_chinese_common_train_v2. 0. Yml 94.8 M Inference model / Training model / Pretraining model

Note: The training model is based on the finetune of the pre-training model on real data and vertical synthetic text data, which has better performance in real application scenarios. The pre-training model is directly trained on the full real data and synthetic data, which is more suitable for finetune of one’s own data set.

English recognition Model
The model name Model description The configuration file Inference model size Download address
En_number_mobile_slim_v2. 0 _rec Slim trimmed quantization ultra lightweight model, support English, digital recognition rec_en_number_lite_train.yml 2.7 M Inference model / Training model
En_number_mobile_v2. 0 _rec Original ultra-light model, support English, digital recognition rec_en_number_lite_train.yml 2.6 M Inference model / Training model

%cd ~/PaddleOCR/ ! wget https://paddleocr.bj.bcebos.com/dygraph_v2. 0/ch/ch_ppocr_mobile_v2. 0_rec_pre.tar ! tar -xf ch_ppocr_mobile_v2. 0_rec_pre.tar
Copy the code
/home/aistudio/PaddleOCR --2021-12-30 17:57:27-- https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com) | 182.61.200.229 | : 443... connected. HTTP request sent, awaiting response... 200 OK Length: 16130750 (15M) [application/x-tar] Saving to: 'ch_ppocr_mobile_v2. 0 _rec_pre. Tar' ch_ppocr_mobile_v2. 100% [= = = = = = = = = = = = = = = = = = = >] 15.38 M 21.9 MB/s 0.7 in the 2021-12-30 s 17:57:28 (21.9MB /s) - 'CH_pPOcr_mobile_v2.0_rec_pre.tar' saved [16130750/16130750]Copy the code

Five, the training

%cd ~/PaddleOCR/ ! python tools/train.py -c ./configs/rec/ch_ppocr_v2. 0/rec_chinese_lite_train_v2. 0.yml -o Global.checkpoints=./output/rec_chinese_lite_v2. 0/latest
Copy the code

1. Select an appropriate batch size

2. Training log

[2021/12/30 23:26:54] Root INFO: epoch: [68/500], ITER: 9930, LR: 0.000962, Loss: 5.635038, ACC: 0.521482, norm_edit_dis: 0.745346, reader_cost: 0.01405s, batCH_cost: 0.26990s, samples: 2560, ips: 948.50786 [2021/12/30 23:27:11] root INFO: epoch: [68/500], ITER: 9940, LR: 0.000962, Loss: 5.653114, ACC: 0.509764, norm_edit_dis: 0.740487, reader_cost: 0.01402 s, batCH_cost: 0.26862s, samples: 2560, IPS: 953.03473 [2021/12/30 23:27:26] root INFO: epoch: [68/500], ITER: 9950, LR: 0.000962, Loss: 5.411234, ACC: 0.515623, NORM_EDIT_dis: 0.748549, reader_cost: 0.00091s, batCH_cost: 0.26371s, samples: 2560, IPS: 970.76457 [2021/12/30 23:27:40] root INFO: epoch: [68/500], ITER: 9960, LR: 0.000962, Loss: 5.588465, ACC: 0.525389, NORM_EDIT_dis: 0.755345, reader_cost: 0.00684s, batCH_cost: 0.25901s, samples: 2560, IPS: 988.38445 [2021/12/30 23:27:48] root INFO: epoch: [68/500], ITER: 9970, LR: 0.000961, Loss: 5.789876, ACC: 0.513670, NORM_EDIT_dis: 0.740609, reader_cost: 0.00095s, batCH_cost: 0.15022s, samples: 2560, IPS: 1704.17763 [2021/12/30 23:27:51] root INFO: epoch: [68/500], ITER: 9974, LR: 0.000961, Loss: 5.787237, ACC: Samples: 0.511717, norm_edit_dis: 0.747102, reader_cost: 0.00018s, batCH_cost: 0.05935s, samples: 1024, ips: 1725.41448 [2021/12/30 23:27:51] root INFO: Save model in./output/rec_chinese_lite_v2.0/latest [2021/12/30 23:27:51] root INFO: Initialize indexs of datasets:['/home/aistudio/data/data62842/train.txt'] [2021/12/30 23:28:21] root INFO: epoch: [69/500], iter: 9980, LR: 0.000961, Loss: 5.801509, ACC: 0.517576, NORM_EDIT_dis: 0.749756, reader_cost: 1.10431s, BATCH_cost: 1.37585s, samples: 1536, IPS: 111.64048 [2021/12/30 23:28:40] root INFO: epoch: [69/500], iter: 9990, LR: 0.000961, loss: 5.548770, ACC: 0.533201, NORM_EDIT_dis: 0.762078, reader_cost: 0.00839s, batCH_cost: 0.32035s, samples: 2560, IPS: 799.11578 [2021/12/30 23:28:56] root INFO: epoch: [69/500], iter: 10000, LR: 0.000961, loss: 5.449094, ACC: 0.537107, NORM_EDIT_dis: 0.762517, reader_cost: 0.00507s, batCH_cost: 0.25845s, samples: 2560, ips: 990.51517 Eval Model :: 100% | █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ | 20/20 [00:15 "00:00, 1.98 it/s] [2021/12/30 23:29:12] root INFO: cur metric, acc: 0.4641999071600186, norm_edit_dis: 0.6980459628854201, FPS: 4204.853978632389 [2021/12/30 23:29:12] root INFO: Best metric, ACC: 0.48179990364001923, STARt_epoch: 12, NORM_EDIT_DIS: 0.7096561279006699, FPS: 4618.199275059127, best_epoch: 46Copy the code

3. Visualdl

  • Install VisualDL locallypip install visualdl
  • Download logs to the local PC
  • Start visualDL visualizationvisualdl --logdir ./
  • Open a browser to view http://localhost:8040/

6. Model evaluation

Changing the CPU CPU parameter using Global. Changing the CPU parameter using Global%cd ~/PaddleOCR/ ! python -m paddle.distributed.launch tools/eval.py -c configs/rec/ch_ppocr_v2. 0/rec_chinese_lite_train_v2. 0.yml \
    -o Global.checkpoints=./output/rec_chinese_lite_v2. 0/latest
Copy the code
/home/aistudio/PaddleOCR
-----------  Configuration Arguments -----------
backend: auto
elastic_server: None
force: False
gpus: None
heter_devices: 
heter_worker_num: None
heter_workers: 
host: None
http_port: None
ips: 127.0.0.1
job_id: None
log_dir: log
np: None
nproc_per_node: None
run_mode: None
scale: 0
server_num: None
servers: 
training_script: tools/eval.py
training_script_args: ['-c', 'configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml', '-o', 'Global.checkpoints=./output/rec_chinese_lite_v2.0/latest']
worker_num: None
workers: 
------------------------------------------------
WARNING 2021-12-31 11:38:43,737 launch.py:423] Not found distinct arguments and compiled with cuda or xpu. Default use collective mode
launch train in GPU mode!
INFO 2021-12-31 11:38:43,740 launch_utils.py:528] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:33326               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:33326               |
    |                     PADDLE_RANK_IN_NODE                        0                      |
    |                 PADDLE_LOCAL_DEVICE_IDS                        0                      |
    |                 PADDLE_WORLD_DEVICE_IDS                        0                      |
    |                     FLAGS_selected_gpus                        0                      |
    |             FLAGS_selected_accelerators                        0                      |
    +=======================================================================================+

INFO 2021-12-31 11:38:43,740 launch_utils.py:532] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
launch proc_id:3811 idx:0
[2021/12/31 11:38:45] root INFO: Architecture : 
[2021/12/31 11:38:45] root INFO:     Backbone : 
[2021/12/31 11:38:45] root INFO:         model_name : small
[2021/12/31 11:38:45] root INFO:         name : MobileNetV3
[2021/12/31 11:38:45] root INFO:         scale : 0.5
[2021/12/31 11:38:45] root INFO:         small_stride : [1, 2, 2, 2]
[2021/12/31 11:38:45] root INFO:     Head : 
[2021/12/31 11:38:45] root INFO:         fc_decay : 1e-05
[2021/12/31 11:38:45] root INFO:         name : CTCHead
[2021/12/31 11:38:45] root INFO:     Neck : 
[2021/12/31 11:38:45] root INFO:         encoder_type : rnn
[2021/12/31 11:38:45] root INFO:         hidden_size : 48
[2021/12/31 11:38:45] root INFO:         name : SequenceEncoder
[2021/12/31 11:38:45] root INFO:     Transform : None
[2021/12/31 11:38:45] root INFO:     algorithm : CRNN
[2021/12/31 11:38:45] root INFO:     model_type : rec
[2021/12/31 11:38:45] root INFO: Eval : 
[2021/12/31 11:38:45] root INFO:     dataset : 
[2021/12/31 11:38:45] root INFO:         data_dir : /home/aistudio/data/data62842/train_images
[2021/12/31 11:38:45] root INFO:         label_file_list : ['/home/aistudio/data/data62842/val.txt']
[2021/12/31 11:38:45] root INFO:         name : SimpleDataSet
[2021/12/31 11:38:45] root INFO:         transforms : 
[2021/12/31 11:38:45] root INFO:             DecodeImage : 
[2021/12/31 11:38:45] root INFO:                 channel_first : False
[2021/12/31 11:38:45] root INFO:                 img_mode : BGR
[2021/12/31 11:38:45] root INFO:             CTCLabelEncode : None
[2021/12/31 11:38:45] root INFO:             RecResizeImg : 
[2021/12/31 11:38:45] root INFO:                 image_shape : [3, 32, 320]
[2021/12/31 11:38:45] root INFO:             KeepKeys : 
[2021/12/31 11:38:45] root INFO:                 keep_keys : ['image', 'label', 'length']
[2021/12/31 11:38:45] root INFO:     loader : 
[2021/12/31 11:38:45] root INFO:         batch_size_per_card : 256
[2021/12/31 11:38:45] root INFO:         drop_last : False
[2021/12/31 11:38:45] root INFO:         num_workers : 8
[2021/12/31 11:38:45] root INFO:         shuffle : False
[2021/12/31 11:38:45] root INFO: Global : 
[2021/12/31 11:38:45] root INFO:     cal_metric_during_train : True
[2021/12/31 11:38:45] root INFO:     character_dict_path : ppocr/utils/ppocr_keys_v1.txt
[2021/12/31 11:38:45] root INFO:     checkpoints : ./output/rec_chinese_lite_v2.0/latest
[2021/12/31 11:38:45] root INFO:     debug : False
[2021/12/31 11:38:45] root INFO:     distributed : False
[2021/12/31 11:38:45] root INFO:     epoch_num : 500
[2021/12/31 11:38:45] root INFO:     eval_batch_step : [0, 2000]
[2021/12/31 11:38:45] root INFO:     infer_img : doc/imgs_words/ch/word_1.jpg
[2021/12/31 11:38:45] root INFO:     infer_mode : False
[2021/12/31 11:38:45] root INFO:     log_smooth_window : 20
[2021/12/31 11:38:45] root INFO:     max_text_length : 25
[2021/12/31 11:38:45] root INFO:     pretrained_model : ./ch_ppocr_mobile_v2.0_rec_pre/best_accuracy
[2021/12/31 11:38:45] root INFO:     print_batch_step : 10
[2021/12/31 11:38:45] root INFO:     save_epoch_step : 3
[2021/12/31 11:38:45] root INFO:     save_inference_dir : None
[2021/12/31 11:38:45] root INFO:     save_model_dir : ./output/rec_chinese_lite_v2.0
[2021/12/31 11:38:45] root INFO:     save_res_path : ./output/rec/predicts_chinese_lite_v2.0.txt
[2021/12/31 11:38:45] root INFO:     use_gpu : True
[2021/12/31 11:38:45] root INFO:     use_space_char : True
[2021/12/31 11:38:45] root INFO:     use_visualdl : True
[2021/12/31 11:38:45] root INFO: Loss : 
[2021/12/31 11:38:45] root INFO:     name : CTCLoss
[2021/12/31 11:38:45] root INFO: Metric : 
[2021/12/31 11:38:45] root INFO:     main_indicator : acc
[2021/12/31 11:38:45] root INFO:     name : RecMetric
[2021/12/31 11:38:45] root INFO: Optimizer : 
[2021/12/31 11:38:45] root INFO:     beta1 : 0.9
[2021/12/31 11:38:45] root INFO:     beta2 : 0.999
[2021/12/31 11:38:45] root INFO:     lr : 
[2021/12/31 11:38:45] root INFO:         learning_rate : 0.001
[2021/12/31 11:38:45] root INFO:         name : Cosine
[2021/12/31 11:38:45] root INFO:         warmup_epoch : 5
[2021/12/31 11:38:45] root INFO:     name : Adam
[2021/12/31 11:38:45] root INFO:     regularizer : 
[2021/12/31 11:38:45] root INFO:         factor : 1e-05
[2021/12/31 11:38:45] root INFO:         name : L2
[2021/12/31 11:38:45] root INFO: PostProcess : 
[2021/12/31 11:38:45] root INFO:     name : CTCLabelDecode
[2021/12/31 11:38:45] root INFO: Train : 
[2021/12/31 11:38:45] root INFO:     dataset : 
[2021/12/31 11:38:45] root INFO:         data_dir : /home/aistudio/data/data62842/train_images
[2021/12/31 11:38:45] root INFO:         label_file_list : ['/home/aistudio/data/data62842/train.txt']
[2021/12/31 11:38:45] root INFO:         name : SimpleDataSet
[2021/12/31 11:38:45] root INFO:         transforms : 
[2021/12/31 11:38:45] root INFO:             DecodeImage : 
[2021/12/31 11:38:45] root INFO:                 channel_first : False
[2021/12/31 11:38:45] root INFO:                 img_mode : BGR
[2021/12/31 11:38:45] root INFO:             RecAug : None
[2021/12/31 11:38:45] root INFO:             CTCLabelEncode : None
[2021/12/31 11:38:45] root INFO:             RecResizeImg : 
[2021/12/31 11:38:45] root INFO:                 image_shape : [3, 32, 320]
[2021/12/31 11:38:45] root INFO:             KeepKeys : 
[2021/12/31 11:38:45] root INFO:                 keep_keys : ['image', 'label', 'length']
[2021/12/31 11:38:45] root INFO:     loader : 
[2021/12/31 11:38:45] root INFO:         batch_size_per_card : 256
[2021/12/31 11:38:45] root INFO:         drop_last : True
[2021/12/31 11:38:45] root INFO:         num_workers : 8
[2021/12/31 11:38:45] root INFO:         shuffle : True
[2021/12/31 11:38:45] root INFO: profiler_options : None
[2021/12/31 11:38:45] root INFO: train with paddle 2.2.1 and device CUDAPlace(0)
[2021/12/31 11:38:45] root INFO: Initialize indexs of datasets:['/home/aistudio/data/data62842/val.txt']
W1231 11:38:45.574332  3811 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W1231 11:38:45.579066  3811 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2021/12/31 11:38:50] root INFO: resume from ./output/rec_chinese_lite_v2.0/latest
[2021/12/31 11:38:50] root INFO: metric in ckpt ***************
[2021/12/31 11:38:50] root INFO: acc:0.48179990364001923
[2021/12/31 11:38:50] root INFO: start_epoch:72
[2021/12/31 11:38:50] root INFO: norm_edit_dis:0.7096561279006699
[2021/12/31 11:38:50] root INFO: fps:4618.199275059127
[2021/12/31 11:38:50] root INFO: best_epoch:46

eval model::   0%|          | 0/20 [00:00<?, ?it/s]
eval model::   5%|▌         | 1/20 [00:03<01:12,  3.79s/it]
eval model::  10%|█         | 2/20 [00:04<00:52,  2.92s/it]
eval model::  15%|█▌        | 3/20 [00:05<00:38,  2.29s/it]
eval model::  20%|██        | 4/20 [00:06<00:29,  1.85s/it]
eval model::  25%|██▌       | 5/20 [00:07<00:23,  1.54s/it]
eval model::  30%|███       | 6/20 [00:07<00:18,  1.32s/it]
eval model::  35%|███▌      | 7/20 [00:08<00:15,  1.17s/it]
eval model::  40%|████      | 8/20 [00:09<00:12,  1.07s/it]
eval model::  45%|████▌     | 9/20 [00:10<00:10,  1.00it/s]
eval model::  50%|█████     | 10/20 [00:11<00:09,  1.06it/s]
eval model::  55%|█████▌    | 11/20 [00:12<00:08,  1.10it/s]
eval model::  60%|██████    | 12/20 [00:12<00:07,  1.14it/s]
eval model::  65%|██████▌   | 13/20 [00:13<00:06,  1.16it/s]
eval model::  70%|███████   | 14/20 [00:14<00:05,  1.19it/s]
eval model::  75%|███████▌  | 15/20 [00:15<00:04,  1.20it/s]
eval model::  80%|████████  | 16/20 [00:16<00:03,  1.21it/s]
eval model::  85%|████████▌ | 17/20 [00:16<00:02,  1.21it/s]
eval model::  90%|█████████ | 18/20 [00:17<00:01,  1.22it/s]
eval model::  95%|█████████▌| 19/20 [00:18<00:00,  1.22it/s]
eval model:: 100%|██████████| 20/20 [00:19<00:00,  1.41it/s]
[2021/12/31 11:39:09] root INFO: metric eval ***************
[2021/12/31 11:39:09] root INFO: acc:0.4737999052400189
[2021/12/31 11:39:09] root INFO: norm_edit_dis:0.706719055893877
[2021/12/31 11:39:09] root INFO: fps:4160.243256411111
INFO 2021-12-31 11:39:10,794 launch.py:311] Local processes completed.
Copy the code

7. Result prediction

The prediction script uses the model trained by prediction and saves the result in TXT format, which can be directly sent to the contest submission entrance for evaluation. The default file is saved in output/rec/ predicts_lite_v2.0.txt

1. Content and format of submission

Participants were required to submit training models using PaddlePaddle, a deep learning platform. Participants are asked to submit their results in.txt text format, where each line is the result of the image name and word prediction, with “\t” as the separator in the middle, as shown in the following example:

new_name value
0.jpg Text 0

2. Infer_rec. Py modification

with open(save_res_path, "w") as fout: Fout. write('new_name' +'\ t '+' value' +'\n') for file in get_image_file_list(config['Global']['infer_img']): logger.info("infer_img: {}".format(file)) with open(file, 'rb') as f: img = f.read() data = {'image': img} batch = transform(data, ops) if config['Architecture']['algorithm'] == "SRN": encoder_word_pos_list = np.expand_dims(batch[1], axis=0) gsrm_word_pos_list = np.expand_dims(batch[2], axis=0) gsrm_slf_attn_bias1_list = np.expand_dims(batch[3], axis=0) gsrm_slf_attn_bias2_list = np.expand_dims(batch[4], axis=0) others = [ paddle.to_tensor(encoder_word_pos_list), paddle.to_tensor(gsrm_word_pos_list), paddle.to_tensor(gsrm_slf_attn_bias1_list), paddle.to_tensor(gsrm_slf_attn_bias2_list) ] if config['Architecture']['algorithm'] == "SAR": valid_ratio = np.expand_dims(batch[-1], axis=0) img_metas = [paddle.to_tensor(valid_ratio)] images = np.expand_dims(batch[0], axis=0) images = paddle.to_tensor(images) if config['Architecture']['algorithm'] == "SRN": preds = model(images, others) elif config['Architecture']['algorithm'] == "SAR": preds = model(images, img_metas) else: preds = model(images) post_result = post_process_class(preds) info = None if isinstance(post_result, dict): rec_info = dict() for key in post_result: if len(post_result[key][0]) >= 2: rec_info[key] = { "label": post_result[key][0][0], "score": float(post_result[key][0][1]), } info = json.dumps(rec_info) else: if len(post_result[0]) >= 2: info = post_result[0][0] + "\t" + str(post_result[0][1]) if info is not None: logger.info("\t result: {} ". The format (info)) # fout. Write (file + "\ t" + info) # formatting output fout. Write (file + "\ t" + post_result [0] [0] + '\ n') logger.info("success!" )Copy the code
%cd ~/PaddleOCR/ ! python tools/infer_rec.py -c configs/rec/ch_ppocr_v2. 0/rec_chinese_lite_train_v2. 0.yml \
    -o Global.infer_img="/home/aistudio/data/data62843/test_images" \
    Global.pretrained_model=". / output/rec_chinese_lite_v2. 0 / best_accuracy"
Copy the code
[2021/12/31 11:52:29] root INFO: infer_img: /home/aistudio/data/data62843/test_images/9401.jpg
Copy the code

To predict the log

[2021/12/30 23:53:50] root INFO: result: infer_img: /home/aistudio/data/data62843/test_images/9995.jpg [2021/12/30 23:53:50] root INFO: result: [2021/12/30 23:53:50] Root INFO: infer_img: /home/aistudio/data/data62843/test_images/9996.jpg [2021/12/30 23:53:50] root INFO: result: 279 0.97771764 [2021/12/30 23:53:50] Root INFO: infer_img: /home/aistudio/data/data62843/test_images/9997.jpg [2021/12/30 23:53:50] root INFO: result: Bull decoration switch 0.9916236 [2021/12/30 23:53:50] Root INFO: infer_img: /home/aistudio/data/data62843/test_images/9998.jpg [2021/12/30 23:53:50] root INFO: result: [2021/12/30 23:53:50] Root INFO: infer_img: /home/aistudio/data/data62843/test_images/9999.jpg [2021/12/30 23:53:50] root INFO: result: [2021/12/30 23:53:50] root INFO: Success! . .Copy the code

8. Prediction based on prediction engine

1. Model size limitation

Constraint condition 1: the total size of the model does not exceed 10MB (the total space occupied by the. Pdmodel and. Pdiparams disks that are not compressed).

2. Solutions

During training, we capture only the parameters of the subsystem, which are used in recovery training. In fact, the constraints here limited the size of the Inference model. In general, the inference model is model training, and the model structure and model parameters are saved in the file of the solidified model, which is mostly used to predict deployment scenarios. Compared with the parameter Inference model that I changing, the Inference model saves the structure information of the model, and has superior performance in predicting deployment and accelerating inference, which is flexible and convenient. It is suitable for the actual system integration, and the model size is smaller.

# Static model export%cd ~/PaddleOCR/ ! python tools/export_model.py -c configs/rec/ch_ppocr_v2. 0/rec_chinese_lite_train_v2. 0.yml -o Global.pretrained_model=./output/rec_chinese_lite_v2. 0/best_accuracy.pdparams  Global.save_inference_dir=./inference/rec_inference/
Copy the code
/home/aistudio/PaddleOCR W1230 23:54:48.747483 13346 Device_context. cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1 W1230 23:54:48.752360 13346 Device_Context. cc:465] Device: 0, cuDNN Version: 7.6. [2021/12/30 23:54:52] root INFO: Load pretrain successful from./output/rec_chinese_lite_v2.0/best_accuracy [2021/12/30 23:54:54] root INFO: inference model is saved to ./inference/rec_inference/inferenceCopy the code
%cd ~/PaddleOCR/ ! du -sh ./inference/rec_inference/Copy the code
/ home/aistudio/PaddleOCR 5.2 m. / inference/rec_inference /Copy the code
  • It can be seen that the CRNN algorithm used in the current training is only 5.2m after inference was derived.
  • The inference model derived can also be used for prediction, and the prediction logic is shown in the code below.
# Use exported static model predictions%cd ~/PaddleOCR/ ! python37. tools/infer/predict_rec.py  --rec_model_dir=./inference/rec_inference/  --image_dir="/home/aistudio/data/A list test data set /TestAImages"
Copy the code

To predict the log

[2021/12/30 13:20:08] root INFO: Font-family: 'MJ', 0.2357887 [2021/12/30 13:20:08] root INFO: Font-family: 'Zhongmen ', 0.7167614 [2021/12/30 13:20:08] root INFO: Font-family: 'Yellow braised chicken with rice ', 0.7325407) [2021/12/30 13:20:08] root INFO: Font-family: 'add line ', 0.06699998) [2021/12/30 13:20:08] root INFO: Font-family: 'Chuanshandao ', cian-t 0.40579563) [2021/12/30 13:20:08] root INFO: Font-family: 'Green village machine CIan-ciAN-t ', cian-CIan-CIan-T 0.38243735) [2021/12/30 13:20:08] root INFO: Font-family: arial, cian-cian-ALIENATED /home/aistudio/data/A list test data set /TestAImages/ testa_000007.jpg 0.38957664) [2021/12/30 13:20:08] root INFO: Font-family: 'Qiongzhonghai ', 0.36037388) [2021/12/30 13:20:08] root INFO: Font-family: 'L', 0.25453746) [2021/12/30 13:20:08] root INFO: Font-family: 'Qingshuiru ', cian-font-family:' Chuanshuiru ', [2021/12/30 13:20:08] root INFO: Font-family: arial, cian-cian-t, 0.50577885) .Copy the code

Nine, submit

The predicted results are saved to the output/rec/predicts_chinese_lite_v2.0.txt file specified in the configuration file, which can be submitted directly.

%cd ~ ! head PaddleOCR/output/rec/predicts_chinese_lite_v2. 0.txt
Copy the code
/home/aistudio new_name value 0. JPG Traditional Chinese Medicine 100.jpg 210 1000.jpg wire, etc. 100.jpg 20 1002.jpg import lake paper professional manufacturing 1003.jpg 1567C 1004.jpg iTNoWCopy the code

Submit score… Low, we can refer to the front big data post-processing, such as eliminating space, case, full Angle and half Angle optimization.