Word recognition, mMCOR personal use records

Mmocr official code: github.com/open-mmlab/…

First of all, thanks to the selflessness of a group of executives, the code is open source, as well as the hard sensetime executives have been maintaining updates. This time write some of their own use records, like me just to the cute new people.

First, the use of MMOCR

Environment configuration

For a series 30 graphics card that only supports CUDa11, the environment configuration is a bit tricky. For a 3070 graphics card, I will post my own configuration process:

# mmocr for 3070

conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab

# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install pytorch==1.8. 0 torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia

# install the latest mmcv-full
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch18.. 0/index.html
# install mmdetection
pip install mmdet

# install mmocr
git clone https://github.com/open-mmlab/mmocr.git
cd mmocr

pip install -r requirements.txt
pip install -v -e .  # or "python setup.py build_ext --inplace"
export PYTHONPATH=$(pwd):$PYTHONPATH
Copy the code

After running the code, there may be a small problem with cocoAPI, and AttributeError: COCO object has no attribute get_cat_ids. There are several solutions

Method # 1
git clone https://github.com/open-mmlab/cocoapi.git
cd cocoapi/pycocotools
pip install .
Method # 2
pip uninstall pycocotools
pip install mmpycocotools
Copy the code

Then we can run an official demo to see if the environment is OK

python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
Copy the code

Prepare your own training data

The official tutorial Datasets Preparation is very detailed, I will add a little detail

Text Detection data

I have converted the data into coco format. For the format of COCO data, please refer to the annotation format of Coco data set in Gemfield. If you are a standard academic data set, the official tools code contains scripts for converting various data to and from each other. The data directory Settings are set in the.py file corresponding to configs, as shown in the following example
```
# specify the data type that makes the. TXT tag file, the 'IcdarDataset' class use the. Json tag file
dataset_type = 'TextDetDataset'
# image directory prefix
img_prefix = 'tests/data/toy_dataset/imgs'
# the annotation file
test_anno_file = 'tests/data/toy_dataset/instances_test.txt'
Copy the code
```
I use the ‘IcdarDataset’ class data, and the official config is basically this kind of data. The most important thing is the annotation file in json format, which can be referred to the official file sample. The dictionary format and the keys are mainly “images”. “Categories” and “annotations” :

“images”

The value of “images” is a dictionary list, and each element of the list is a dictionary that contains information about images, examples
```
{"file_name": "training/0336.png"."height": 1200."width": 1600."segm_file": "training/0336.xml"."id": 0}
Copy the code
```
- “File_name” specifies the location of the file. Ensure that the file can be read from the img_prefix directory prefix
- “Segm_file” separate annotation file for each image, optional, actually defined in “Annotations”
- Id, the image ID, that’s important, and the annotation behind that is going to correspond to each image ID
“categories”

The value of “Categories” is a dictionary list, or tag categories. Since we have text categories in OCR, only one category is ok, just copy and paste it mindlessly
```
"categories": [{"id": 1."name": "text"}] 
Copy the code
```
“annotations”

The value of “Annotations” is also a dictionary list, and each element of the list is a dictionary, the ground truth tag that is ultimately read, for example
```
{"iscrowd": 0."category_id": 1."bbox": [213.16.370.1163]."area": 168314.0."segmentation": [[485.1179.306.991.252.800.213.608.215.413.274.214.402.16.535.130.471.291.296.460.301.620.365.777.490.931.583.1089]], "image_id": 0."id": 0}
Copy the code
```
- “Iscrowd “, 0 is polygon format segmentation; 1 is RLE format segmentation. Refer to the coco data format above
- ‘category_id’ target category — it’s all text anyway
- Gt of the form “bbox” [x,y,w,h], the first two are the coordinates of the upper left corner points, and w and h are the width and height of the box
- “Area” segmentation area
- “segmentation” [x1,y1,x2,y2…] Gt of a polygon, every two is a pair of coordinates of a point
- “Image_id” corresponds to the id of the image
- “Id” is very important because each image may have multiple targets and this ID must be globally unique. Therefore, the value of [0-total number of segmentation] cannot start from 0 when one image is traversed every time
According to their own different data, according to the format of the above save. Json file, and then fill in the corresponding directory in configs. When running the code, if you can run but no loss occurs, only weight files will be saved, that is, there is a problem with your data format, or the corresponding directory setting is wrong.

The Text Recognition data

This type of data is relatively simple. Each line of annotation only needs to specify the file name and corresponding text label. The premise is that you need to extract each segmentation area into separate image data.
```
train_words/1001724.jpg Chiquita
Copy the code
```
The first part is the path of the file. The absolute path and the relative path can be used. The second part is the real label of the text

Train_prefix specifies the image directory prefix, train_ann_file specifies the annotation file location, and train_ann_file specifies the annotation file location. The same goes for the test setting.
```
dataset_type = 'OCRDataset'
train_prefix = 'data/chinese/'
train_ann_file = train_prefix + 'labels/train.txt'
Copy the code
```
It should also be noted that word recognition requires a dictionary, defined by dict_file. For Chinese word recognition, the official SAR model already has pre-training weights, so you can download it and fine-tune it yourself. The effect is very good. Everyone should just open mouth….
```
dict_file = 'data/chineseocr/labels/dict_printed_chinese_english_digits.txt'
Copy the code
```

Training model test results

In fact, the data is ready, the other are very simple, naturally, the official tutorial also has very detailed information, I will not repeat. Py is a file in tools, so we can execute it directly. Here is a simple example:

--work-dir specifies the location where the log weight should be saved. --load-from load model before training; -- Resume-from; -- Number of Gpus used; -- GPU-ids Specifies the GPU ID

python ./tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_chinese.py  --work-dir ./results/sar/  --load-from   checkpoints/sar_chineseocr.pth  --gpus 1 --gpu-ids 4
Copy the code

You can just look at the.py file and see what args it needs. Evaluation is set in the last line of config during training. Also, in the first line, there will be _base parameters, the first is to define the optimizer learning rate, etc., the second is the weight save interval, etc.

evaluation = dict(interval=10, metric='hmean-iou')
_base_ = [
    '.. /.. /_base_/schedules/schedule_1200e.py'.'.. /.. /_base_/default_runtime.py'
]
Copy the code

If there is a follow-up supplement…

Word recognition, mMCOR personal use records

First, the use of MMOCR

Environment configuration

Prepare your own training data

Text Detection data

The Text Recognition data

Training model test results

Related Posts

On the way of digital transformation, you have a map in your hand, but you have to walk on your own

Python syntax overview: Python syntax overview

Full-featured workflow choreography tool: Flyte