The article directories

  • “This is the strongest OCR open source algorithm model I’ve ever seen.”
  • preface
  • One, come on, show!
  • 2. Introduction to OCR
    • (I) What is OCR
    • (2) Application examples
    • (3) OCR difficulty
  • PaddleOCR introduction
    • 1. Summary and introduction
    • (2) Summary of relevant addresses
  • 4. PaddleOCR
    • PaddleOCR Project introduction
    • (2) Test your own data
  • 5. Multi-dimensional comparative analysis
    • (I) Comparison of the completeness of the course
    • (2) Ease-of-use comparison
    • (3) Comparison of operation speed
    • (4) Precision comparison
    • (5) Multi-angle comparison
    • (6) Other analysis
  • Six, summarized

“This is the strongest OCR open source algorithm model I’ve ever seen.”

preface

Recently participated in an OCR recognition related competition in “China Software Cup”.

The link: www.cnsoftbei.com/plus/view.p…

Some requirements are as follows:

Hand rip code afraid of duck.

With the permission of the topic, our team searched for the open source OCR recognition algorithm model. There are AdvancedEAST and AttentionOCR algorithms on Github, which are relatively well-known, as well as EasyOCR and PaddleOCR. This paper analyzes several OCR recognition algorithm models, draws some conclusions, and chooses a model with high accuracy, who is specific, still need to look down.

One, come on, show!

If you want to know something, you have to see how it works. Just like reading a paper, be sure to read the abstract first. If the key points mentioned in the abstract are not what you want, then there is no need to read down.

Her recognition effects are as follows:

Identify the effect drawing


\

She doesn’t care if you’re horizontal, vertical, or punctuated; As long as it is the text can be detected for you, the accuracy is certainly not perfect, most of them are above 0.98.

She now supports the recognition of Chinese, English, Japanese, German, French and other languages.

sheThe key andDirect operationThe web versionandMobile version.No programming background.No development environmentalsocanLet youEasy to use.

Mobile terminal identification renderings


\

Web version identification renderings


\

I have to admit, this open source project is really good. This effect is great, convenient, simple, practical, quick to identify. Really love.

Ha ha, doesn’t she look great?

But after reading so much, you still don’t know who I said she was, is not quite worried.

She has a layer of mystery, let’s slowly uncover this layer of mystery.

She is Baidu’s open source PaddleOCR project.

Here’s a look at the performance benefits of OCR and PaddleOCR and develop a simple example of how to use it.

Our entries (part of PPT presentation) :

PPT may not be very good, we have questions can be put forward, hey hey, a lot of communication!



It’s awesome. PaddleOCR with our own NLP is awesome!

Of course, it is still in the competition stage, and other information is not convenient to open. If you want the source code and other information, I can provide it after the game, you can leave a message on the mailbox, or add my fans, and wait for me to upload.

2. Introduction to OCR

(I) What is OCR

OCR — Optical Character Recognition (OCR) refers to the process of analyzing and recognizing image files of text data and obtaining text and layout information. The text in the image is also recognized and returned in the form of text.

(2) Application examples

OCR technology has rich application scenarios, includingalreadyinVertical - oriented structured text recognition widely used in daily life, such asLicense plate recognition,Identification of bank card information,Identification of ID card information,Train ticket information identificationEtc., in addition, the general OCR technology also has a wide range of applications, such as inIn the video sceneOften usedOCR technology for automatic subtitle translation,Content security monitoring and so on, or withA combination of visual featuresTo completeVideo understanding,Video searchSuch tasks.

(3) OCR difficulty

  • 1. Technical difficulties: e.gperspective,The zoom,bending,clutter,The font,multilingual,The fuzzyAnd so on;
  • 2. OCR is often usedConnecting massive Data.But it requires data that can be processed in real time;
  • 3, and OCR applicationOften deployed on mobile or embedded hardwareAnd theStorage space at the endandLimited computing power.So the OCR modeltheThe size of theandPrediction speedThere areHigh demand.

There are so many difficulties, there must be solutions, so where there are difficulties there is a solution — PaddleOCR solves them all. Are you excited to get to know PaddleOCR?

PaddleOCR introduction

Here’s the mystery of PaddleOCR. Meet PaddleOCR.

1. Summary and introduction

  • PaddleOCR is an ultra-lightweight Chinese-English recognition model
  • Our goal is to build a rich, leading and practical library of text recognition models/tools
  • 3.5m practical ultra-lightweight OCR systems support training and deployment between server, mobile, embedded and IoT devices
  • Support Both Chinese and English recognition; Support inclined, vertical and other directions of text recognition
  • Supports GPU and CPU prediction
  • It can run on Linux, Windows, MacOS and other systems
  • Users can easily use the ultralight model directly via PaddleHub or train their ultralight model using the PaddleOCR open source suite

Here’s the official explanation, summarizing a few points:

  • 1, small volume;
  • 2, fast operation;
  • 3, convenient and simple;
  • 4, the performance is good.

(2) Summary of relevant addresses

In order to facilitate the later use of friends, I have summarized the website I use for you, as shown below.

The model is open source and there are many tutorials:

1. GitHub open Source address: github.com/PaddlePaddl…

2, source PaddleHub online experience: aistudio.baidu.com/aistudio/pr…

Fast lane 2020-3, AI PaddleOCR learning tutorial: aistudio.baidu.com/aistudio/ed…

4, experience web site: www.paddlepaddle.org.cn/hub/scene/o…

5. Download qr code on mobile terminal:

4. PaddleOCR

PaddleOCR Project introduction

OCR users’ needs are difficult to be met by a general model. In order to facilitate developers to customize the ultra-lightweight model with their own data, in addition to the 3.5m ultra-lightweight model (which can recognize 6622 Chinese characters), PaddleOCR also provides two text detection algorithms (EAST, DB) and four text recognition algorithms (CRNN, Rosseta, Star-Net, and RARE), which can basically cover the requirements of common OCR tasks, and the algorithms are still in the process of enrichment.

In particular, the Chinese OCR training and prediction skills in model training/evaluation are more dazzling. Click in and you can see the special processing of Chinese long text recognition, how to replace different backbone and other business actual combat skills, which are quite in line with the alchemy needs of developers’ project actual combat.

The above mentioned on its official GitHub has a detailed use of the documentation tutorial, it is recommended to clone the project to the local, directly see that MD format Chinese document, very detailed introduction, from the installation and deployment to run the test, every step, can be called the most complete OCR developer documentation package.



First of all,Most of these tutorials are Linux-basedFor those of you who don’t have a server or are new to deep learning, it might be a little difficult to run.

The following blogger will tell you how to run directly on Windows.

(2) Test your own data

If you want to try the source code, most of you will need to have Python deep learning environment, maybe some of you don’t have paddle environment, of course, this is also very easy to install, and PIP download is very fast, after all, it is a domestic framework! Hey hey 😎.

Fly paddle blade website environment configuration tutorial: www.paddlepaddle.org.cn/install/qui…

After setting up the environment, download the GitHub source or clone Git to the local directory, create a py file in PaddleOCR and type the following:

from paddleocr import PaddleOCR
from tools.infer.utility  import draw_ocr
from PIL import Image
# Paddleocr currently supports English, English, French, German, Korean, And Japanese, which can be switched by modifying the lang parameter
Order # parameter is ` ch `, ` en `, ` French `, ` German `, ` Korean `, ` Japan `.
ocr = PaddleOCR(use_angle_cls=True, lang="ch")
img_path = 'img/test.png'# change the address of your image
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)
# display result
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1] [0] for line in result]
scores = [line[1] [1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.show()
im_show.save('img/result.jpg')
Copy the code

Of course, it can also detect and identify the picture content in the whole folder and save the identification results.

from paddleocr import PaddleOCR
from tools.infer.utility  import draw_ocr
from PIL import Image
import os
import csv

def pre_save(img_path,save_path,csv_path) :
    f = open(csv_path, 'w', encoding='utf-8')
    writer = csv.writer(f)
    writer.writerow(["img"."result"])
    # Paddleocr currently supports English, English, French, German, Korean, And Japanese, which can be switched by modifying the lang parameter
    Order # parameter is ` ch `, ` en `, ` French `, ` German `, ` Korean `, ` Japan `.
    ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
    i=0
    for img in os.listdir(img_path):
        print(img_path+'/'+img)
        i+=1
        result = ocr.ocr(img_path+'/'+img, cls=True)
        image = Image.open(img_path+'/'+img).convert('RGB')
        boxes = [line[0] for line in result]
        txts = [line[1] [0] for line in result]
        scores = [line[1] [1] for line in result]
        im_show = draw_ocr(image, boxes, txts, scores, font_path='/doc/simfang.ttf')
        im_show = Image.fromarray(im_show)
        #im_show.show()
        im_show.save(save_path+img)
        al = []
        for res in result:
            result = res[1][:][:]
            al.append(result)
        print(str(al))
        writer.writerow([img,str(al)])
    print(i)
##img_path is the location of the image to be detected, save_path is the location of the file to save the result (note the slash), csv_path is the location of the CSV file to save the result (this can be created automatically).
img_path=r'F:\lzpython\PaddleOCR-develop\doc\imgs'
save_path=r'F:\chrome\zrbdata\imgs\result1/'
csv_path=r'F:\chrome\zrbdata\imgs\result.csv'
pre_save(img_path,save_path,csv_path)
Copy the code

With this code, the model will run regardless of whether your computer has a GPU or not (it’s just faster with a GPU).

5. Multi-dimensional comparative analysis

Regression, we are to analyze which model algorithm is better; And here’s the best part.

Say alone affirmation also cannot see outstanding place, be inferior to contrast to look, hey hey! The mainstream OCR open source projects include EasyOCR, Chineseocr_lite and paddleocr

Let’s compare the following:

  • 1. Comparison of the completeness of the tutorial;
  • 2. Ease-of-use comparison;
  • 3. Comparison of running speed;
  • 4. Precision comparison;
  • 5. Contrast from multiple angles.

(I) Comparison of the completeness of the course

  • As a developer (a beginner at 😂), the EasyOCR documentation is really a long story and chineseocr_Lite tutorials are not very friendly to PyCharm users
  • PaddleOCR’s documentation is also posted above, but without going into details, it is arguably the most detailed tutorial I have ever seen, from installation to training to deployment.

Conclusion:

EasyOCR and Chineseocr_lite tutorials are not easy for people with no basic knowledge.

PaddleOCR is well-documented and easy to understand for the uninitiated.

(2) Ease-of-use comparison

  • PaddleOCR has a web version.Mobile version.canLet usersUse directly; The same goes for developers. PaddleOCR’s detailed documentation allows developers to quickly understand models, train their data, and even add functionality; EasyOCR should also be able to expand a lot of things like Paddle, but there is still no documentation, there is no way; Chineseocr_lite has a web version and AN API that is very convenient, but the tutorial is still not comprehensive and not very well configured.
  • PaddleOCR also provides a variety of models for developers to use in different situations.
Model description The model name Recommended scenario Detection model Direction classifier
OCR Model for Ultra-light Weight (8.1m) _xx ch_ppocr_mobile_v1. 1 Mobile & server Inference model / Pretraining model Inference model / Pretraining model
General OCR Model in English and Chinese (155.1m) _xx ch_ppocr_server_v1. 1 The server side Inference model / Pretraining model Inference model / Pretraining model
OCR Model for Ultra-lightweight Compression (3.5m) _xx ch_ppocr_mobile_slim_v1. 1 The mobile terminal Inference model / Slim model Inference model / Slim model

(3) Comparison of operation speed

To evaluate the quality of an algorithm, we often evaluate it from the time and space complexity.

  • On my low-spec PC, it took 2.15 seconds for PaddleOCR and 9.96 seconds for EasyOCR to print the same image from start to finish. This is just one image, so you can’t tell the difference after a few seconds.

  • Next, several pictures are used to compare the running time. The main differences are as follows:

    • It’s running on my computerThe elapsed timeonThere is a clear gap.PaddleOCR ran 141 pictures.It took just 6.9 seconds.The average time is 0.048s, consistent with the official time consumption data; whileEasyOCR ran 121 imagesbutWith 40.24 seconds.The average time is 0.33s.
    • Why does EasyOCR use 121 images? I deleted 20.gif images because it did not support.gif image recognition.

    Here’s the official estimate:

    On GPU T4, the mobile terminal model only needs 137ms, and the processing delay on Snapdragon 855 mobile terminal is only about 300ms.

(4) Precision comparison

Of course, to evaluate whether a model can be used, it must depend on accuracy, if the identification is not accurate, it is certainly not applicable. Take the picture below for example.

(1) PaddleOCR:

PaddleOCR identifies 53 horizontal and vertical words (which, as you can see, includes all of the text in the image), of which there are only about five wrong words, three with a confidence level of 0.6 or less, and most with a confidence level of 0.8 or more.

(2) EasyOCR:

EasyOCR recognized 63 paragraphs of text (without the picture frame, it is not clear whether all the text was recognized), which is still a certain distance from paddleocr in terms of recognition content and accuracy.

( 'Redaction China WWW. REDOCN.COM'.0.14594213664531708)
( 'Hey,'.0.9193199872970581)
('The happxienffgefgiga'.0.000323598796967417)
( 'nian le fan tian'.0.537309467792511)
( 'look, theres'.0.22078940272331238)
('alot ofwild flowers'.0.057002753019332886)
('I'm happy because you're happy.'.0.531562089920044)
('Happy childhood, happy childhood.'.0.8216490149497986)
('- Wild Bird Green '.0.004080250393599272)
('on the lawn.'.0.44294288754463196)
('fragrance'.0.9942429661750793)
('.RE'.0.18314962089061737)
( 'Thats'.0.7607358694076538)
('my'.0.6383113861083984)
('frend'.0.45365384221076965)
('FOREVER'.0.7280043363571167)
('The Childhood Party.'.0.7070863246917725)
Copy the code
  • (3) chineseocr_lite:

    The model recognized 35 paragraphs, which were generally similar to paddle, but there was a big difference in confidence. Chineseocr_lite had the highest confidence of 0.59, and most of them were between 0.3 and 0.5.


(5) Multi-angle comparison

For OCR-oriented developers, the open source REPO is most attractive:

  • ① High-quality pre-training model;
  • ② Simple and easy to use training code;
  • ③ Easy to use without pit deployment ability.

A quick comparison of the core capabilities of the current mainstream OCR-oriented open source REPO

Languages, Pretrain model size F1-Score End side deployment Custom training Support for PIP installation
chineseocr_lite In both Chinese and English 4.7 M 0.3899 support Does not support Does not support
easyOCR multilingual 218M 0.2214 Does not support Does not support support
PaddleOCR multilingual 3.5 M 0.521 support support support
  • In terms of languages, Chineseocr_lite only supports Chinese and English. EasyOCR has the advantage of multi-language support, which is very suitable for developers with small language needs. However, PaddleOCR also supports more and more languages, including Chinese and English, French, German, Korean and Japanese.
  • Based on the pre-training model,EasyOCR currently has no ultra-lightweight model.Chineseocr_lite's latest model is around 4.7m.PaddleOCR offers 3.5misThe lightest known in the industry;
  • On the deployment side,The large easyOCR model is not suitable for end-to-end deployment.Chineseocr_lite and PaddleOCR are relatively small, both have end-to-end deployment capabilities, and currentlyPaddleOCR already has a mobile APP;
  • For custom training, actual business scenarios, the pre-training model often cannot meet the needs for custom training and model fine-tuning, but at presentOnly PaddleOCR supports this;
  • In terms of performance indicators, 300 images collected for actual OCR application scenarios, including contracts, license plates, nameplates, train tickets, test sheets, forms, certificates, street view text, business cards, digital display screens, etc., each image has an average of 17 text boxes. PaddleOCR’s F1-score is over 0.5, which is quite good.

(6) Other analysis

As we know, the consistency of training and test data directly affects the model effect. In order to better the model effect, we often need to use our own data to train the ultra-lightweight model. In addition to the 3.5m ultra-lightweight model, PaddleOCR also provides two text detection algorithms and four text recognition algorithms, and publishes corresponding four text detection models and eight text recognition models, on which users can build their own ultra-lightweight models.

PaddleOCR has opened source several well-known text detection and recognition algorithms, all of which match or exceed the original. On ICDAR2015 open data set of text detection, the algorithm has the following effects:

model Backbone network precision recall Hmean Download link
EAST ResNet50_vd 88.18% 85.51% 86.82% Download link
EAST MobileNetV3 81.67% 79.83% 80.74% Download link
DB ResNet50_vd 83.79% 80.65% 82.19% Download link
DB MobileNetV3 75.92% 73.18% 74.53% Download link
SAST ResNet50_vd 92.18% 82.96% 87.33% Download link

The algorithm also performs surprisingly well on the total-text public data set.

In the text recognition algorithm part, four text recognition algorithms of CRNN, Rosseta, STAR-Net and RARE are realized by referring to the text recognition training and evaluation process of DTRB[3], covering the mainstream two types of text recognition algorithms based on CTC and Attention. The algorithm was trained using MJSynth and SynthText and evaluated on IIIT, SVT, IC03, IC13, IC15, SVTP and CUTE data sets. The algorithm results were as follows:

model Backbone network Avg Accuracy Model store naming Download link
Rosetta Resnet34_vd 80.24% rec_r34_vd_none_none_ctc Download link
Rosetta MobileNetV3 78.16% rec_mv3_none_none_ctc Download link
CRNN Resnet34_vd 82.20% rec_r34_vd_none_bilstm_ctc Download link
CRNN MobileNetV3 79.37% rec_mv3_none_bilstm_ctc Download link
STAR-Net Resnet34_vd 83.93% rec_r34_vd_tps_bilstm_ctc Download link
STAR-Net MobileNetV3 81.56% rec_mv3_tps_bilstm_ctc Download link
RARE Resnet34_vd 84.90% rec_r34_vd_tps_bilstm_attn Download link
RARE MobileNetV3 83.32% rec_mv3_tps_bilstm_attn Download link
SRN Resnet50_vd_fpn 88.33% rec_r50fpn_vd_none_srn Download link

Using LSVT street View data set, crop out 30W data according to truth value for position calibration. In addition, 500W synthetic data is generated based on LSVT corpus to train Chinese models. Relevant configuration and pre-training files are as follows:

model Backbone network The configuration file Pretraining model
Super lightweight Chinese model MobileNetV3 rec_chinese_lite_train.yml Download link
General Chinese OCR model Resnet34_vd rec_chinese_common_train.yml Download link

How do you get the results? Refer to the PaddleOCR documentation for text recognition in model training/evaluation

Six, summarized

PaddleOCR sums up a few points:

  • Small volume
  • Run fast
  • The deployment of convenient
  • Using a simple
  • The performance is very good

Through the comparison of various dimensions, we still decide to use PaddleOCR as the model for us to participate in the competition. Now the development is almost complete, you can keep following me. After we participate in the competition, all the specific codes can be published, which is also convenient for everyone to learn.

You can also take a look at the ranking of open source projects from github.com/trending and paperswithcode.com/. The ranking is also very good, indicating that there are a lot of people concerned about this, indicating that there are a lot of user groups.

Of course, it is still in the competition stage, the other is not convenient to open, if you want to source code and other information, I can provide after the game, you can leave a message email, will be sent to everyone after the game.

GitHub open Source: github.com/PaddlePaddl…

Personally, I suggest a Star for domestic open source projects. If you like it, you can also Fork it. In this way, I think they will be more motivated to continue to create and innovate.

Hey hey, if the Star after, you can find me to take my competition source code. Give PaddleOCR a boost.

  • Hey hey, secretly put an official technical exchange group, hiss don't tell others, use micro channel scan the following two-dimensional code, you can join.