The article directories
- “This is the strongest OCR open source algorithm model I’ve ever seen.”
- preface
- One, come on, show!
- 2. Introduction to OCR
-
- (I) What is OCR
- (2) Application examples
- (3) OCR difficulty
- PaddleOCR introduction
-
- 1. Summary and introduction
- (2) Summary of relevant addresses
- 4. PaddleOCR
-
- PaddleOCR Project introduction
- (2) Test your own data
- 5. Multi-dimensional comparative analysis
-
- (I) Comparison of the completeness of the course
- (2) Ease-of-use comparison
- (3) Comparison of operation speed
- (4) Precision comparison
- (5) Multi-angle comparison
- (6) Other analysis
- Six, summarized
“This is the strongest OCR open source algorithm model I’ve ever seen.”
preface
Recently participated in an OCR recognition related competition in “China Software Cup”.
The link: www.cnsoftbei.com/plus/view.p…
Some requirements are as follows:
Hand rip code afraid of duck.
With the permission of the topic, our team searched for the open source OCR recognition algorithm model. There are AdvancedEAST and AttentionOCR algorithms on Github, which are relatively well-known, as well as EasyOCR and PaddleOCR. This paper analyzes several OCR recognition algorithm models, draws some conclusions, and chooses a model with high accuracy, who is specific, still need to look down.
One, come on, show!
If you want to know something, you have to see how it works. Just like reading a paper, be sure to read the abstract first. If the key points mentioned in the abstract are not what you want, then there is no need to read down.
Her recognition effects are as follows:
Identify the effect drawing
\
She doesn’t care if you’re horizontal, vertical, or punctuated; As long as it is the text can be detected for you, the accuracy is certainly not perfect, most of them are above 0.98.
She now supports the recognition of Chinese, English, Japanese, German, French and other languages.
she
The key andDirect operation
的The web version
andMobile version
.No programming background
.No development environment
alsocan
Let youEasy to use
.
Mobile terminal identification renderings
\
Web version identification renderings
\
I have to admit, this open source project is really good. This effect is great, convenient, simple, practical, quick to identify. Really love.
Ha ha, doesn’t she look great?
But after reading so much, you still don’t know who I said she was, is not quite worried.
She has a layer of mystery, let’s slowly uncover this layer of mystery.
She is Baidu’s open source PaddleOCR project.
Here’s a look at the performance benefits of OCR and PaddleOCR and develop a simple example of how to use it.
Our entries (part of PPT presentation) :
PPT may not be very good, we have questions can be put forward, hey hey, a lot of communication!
It’s awesome. PaddleOCR with our own NLP is awesome!
Of course, it is still in the competition stage, and other information is not convenient to open. If you want the source code and other information, I can provide it after the game, you can leave a message on the mailbox, or add my fans, and wait for me to upload.
2. Introduction to OCR
(I) What is OCR
OCR — Optical Character Recognition (OCR) refers to the process of analyzing and recognizing image files of text data and obtaining text and layout information. The text in the image is also recognized and returned in the form of text.
(2) Application examples
OCR technology has rich application scenarios
, includingalready
inVertical - oriented structured text recognition widely used in daily life
, such asLicense plate recognition
,Identification of bank card information
,Identification of ID card information
,Train ticket information identification
Etc., in addition, the general OCR technology also has a wide range of applications, such as inIn the video scene
Often usedOCR technology for automatic subtitle translation
,Content security monitoring and so on
, or withA combination of visual features
To completeVideo understanding
,Video search
Such tasks.
(3) OCR difficulty
- 1. Technical difficulties: e.g
perspective
,The zoom
,bending
,clutter
,The font
,multilingual
,The fuzzy
And so on; - 2. OCR is often used
Connecting massive Data
.But it requires data that can be processed in real time
; - 3, and OCR application
Often deployed on mobile or embedded hardware
And theStorage space at the end
andLimited computing power
.So the OCR model
theThe size of the
andPrediction speed
There areHigh demand
.
There are so many difficulties, there must be solutions, so where there are difficulties there is a solution — PaddleOCR solves them all. Are you excited to get to know PaddleOCR?
PaddleOCR introduction
Here’s the mystery of PaddleOCR. Meet PaddleOCR.
1. Summary and introduction
- PaddleOCR is an ultra-lightweight Chinese-English recognition model
- Our goal is to build a rich, leading and practical library of text recognition models/tools
- 3.5m practical ultra-lightweight OCR systems support training and deployment between server, mobile, embedded and IoT devices
- Support Both Chinese and English recognition; Support inclined, vertical and other directions of text recognition
- Supports GPU and CPU prediction
- It can run on Linux, Windows, MacOS and other systems
- Users can easily use the ultralight model directly via PaddleHub or train their ultralight model using the PaddleOCR open source suite
Here’s the official explanation, summarizing a few points:
- 1, small volume;
- 2, fast operation;
- 3, convenient and simple;
- 4, the performance is good.
(2) Summary of relevant addresses
In order to facilitate the later use of friends, I have summarized the website I use for you, as shown below.
The model is open source and there are many tutorials:
1. GitHub open Source address: github.com/PaddlePaddl…
2, source PaddleHub online experience: aistudio.baidu.com/aistudio/pr…
Fast lane 2020-3, AI PaddleOCR learning tutorial: aistudio.baidu.com/aistudio/ed…
4, experience web site: www.paddlepaddle.org.cn/hub/scene/o…
5. Download qr code on mobile terminal:
4. PaddleOCR
PaddleOCR Project introduction
OCR users’ needs are difficult to be met by a general model. In order to facilitate developers to customize the ultra-lightweight model with their own data, in addition to the 3.5m ultra-lightweight model (which can recognize 6622 Chinese characters), PaddleOCR also provides two text detection algorithms (EAST, DB) and four text recognition algorithms (CRNN, Rosseta, Star-Net, and RARE), which can basically cover the requirements of common OCR tasks, and the algorithms are still in the process of enrichment.
In particular, the Chinese OCR training and prediction skills in model training/evaluation are more dazzling. Click in and you can see the special processing of Chinese long text recognition, how to replace different backbone and other business actual combat skills, which are quite in line with the alchemy needs of developers’ project actual combat.
The above mentioned on its official GitHub has a detailed use of the documentation tutorial, it is recommended to clone the project to the local, directly see that MD format Chinese document, very detailed introduction, from the installation and deployment to run the test, every step, can be called the most complete OCR developer documentation package.
First of all,Most of these tutorials are Linux-based
For those of you who don’t have a server or are new to deep learning, it might be a little difficult to run.
The following blogger will tell you how to run directly on Windows.
(2) Test your own data
If you want to try the source code, most of you will need to have Python deep learning environment, maybe some of you don’t have paddle environment, of course, this is also very easy to install, and PIP download is very fast, after all, it is a domestic framework! Hey hey 😎.
Fly paddle blade website environment configuration tutorial: www.paddlepaddle.org.cn/install/qui…
After setting up the environment, download the GitHub source or clone Git to the local directory, create a py file in PaddleOCR and type the following:
from paddleocr import PaddleOCR
from tools.infer.utility import draw_ocr
from PIL import Image
# Paddleocr currently supports English, English, French, German, Korean, And Japanese, which can be switched by modifying the lang parameter
Order # parameter is ` ch `, ` en `, ` French `, ` German `, ` Korean `, ` Japan `.
ocr = PaddleOCR(use_angle_cls=True, lang="ch")
img_path = 'img/test.png'# change the address of your image
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
# display result
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1] [0] for line in result]
scores = [line[1] [1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.show()
im_show.save('img/result.jpg')
Copy the code
Of course, it can also detect and identify the picture content in the whole folder and save the identification results.
from paddleocr import PaddleOCR
from tools.infer.utility import draw_ocr
from PIL import Image
import os
import csv
def pre_save(img_path,save_path,csv_path) :
f = open(csv_path, 'w', encoding='utf-8')
writer = csv.writer(f)
writer.writerow(["img"."result"])
# Paddleocr currently supports English, English, French, German, Korean, And Japanese, which can be switched by modifying the lang parameter
Order # parameter is ` ch `, ` en `, ` French `, ` German `, ` Korean `, ` Japan `.
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
i=0
for img in os.listdir(img_path):
print(img_path+'/'+img)
i+=1
result = ocr.ocr(img_path+'/'+img, cls=True)
image = Image.open(img_path+'/'+img).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1] [0] for line in result]
scores = [line[1] [1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
#im_show.show()
im_show.save(save_path+img)
al = []
for res in result:
result = res[1][:][:]
al.append(result)
print(str(al))
writer.writerow([img,str(al)])
print(i)
##img_path is the location of the image to be detected, save_path is the location of the file to save the result (note the slash), csv_path is the location of the CSV file to save the result (this can be created automatically).
img_path=r'F:\lzpython\PaddleOCR-develop\doc\imgs'
save_path=r'F:\chrome\zrbdata\imgs\result1/'
csv_path=r'F:\chrome\zrbdata\imgs\result.csv'
pre_save(img_path,save_path,csv_path)
Copy the code
With this code, the model will run regardless of whether your computer has a GPU or not (it’s just faster with a GPU).
5. Multi-dimensional comparative analysis
Regression, we are to analyze which model algorithm is better; And here’s the best part.
Say alone affirmation also cannot see outstanding place, be inferior to contrast to look, hey hey! The mainstream OCR open source projects include EasyOCR, Chineseocr_lite and paddleocr
Let’s compare the following:
- 1. Comparison of the completeness of the tutorial;
- 2. Ease-of-use comparison;
- 3. Comparison of running speed;
- 4. Precision comparison;
- 5. Contrast from multiple angles.
(I) Comparison of the completeness of the course
- As a developer (a beginner at 😂), the EasyOCR documentation is really a long story and chineseocr_Lite tutorials are not very friendly to PyCharm users
- PaddleOCR’s documentation is also posted above, but without going into details, it is arguably the most detailed tutorial I have ever seen, from installation to training to deployment.
Conclusion:
EasyOCR and Chineseocr_lite tutorials are not easy for people with no basic knowledge.
PaddleOCR is well-documented and easy to understand for the uninitiated.
(2) Ease-of-use comparison
PaddleOCR has a web version
.Mobile version
.can
Let usersUse directly
; The same goes for developers. PaddleOCR’s detailed documentation allows developers to quickly understand models, train their data, and even add functionality; EasyOCR should also be able to expand a lot of things like Paddle, but there is still no documentation, there is no way; Chineseocr_lite has a web version and AN API that is very convenient, but the tutorial is still not comprehensive and not very well configured.- PaddleOCR also provides a variety of models for developers to use in different situations.
Model description | The model name | Recommended scenario | Detection model | Direction classifier |
---|---|---|---|---|
OCR Model for Ultra-light Weight (8.1m) | _xx ch_ppocr_mobile_v1. 1 | Mobile & server | Inference model / Pretraining model | Inference model / Pretraining model |
General OCR Model in English and Chinese (155.1m) | _xx ch_ppocr_server_v1. 1 | The server side | Inference model / Pretraining model | Inference model / Pretraining model |
OCR Model for Ultra-lightweight Compression (3.5m) | _xx ch_ppocr_mobile_slim_v1. 1 | The mobile terminal | Inference model / Slim model | Inference model / Slim model |
(3) Comparison of operation speed
To evaluate the quality of an algorithm, we often evaluate it from the time and space complexity.
-
On my low-spec PC, it took 2.15 seconds for PaddleOCR and 9.96 seconds for EasyOCR to print the same image from start to finish. This is just one image, so you can’t tell the difference after a few seconds.
-
Next, several pictures are used to compare the running time. The main differences are as follows:
- It’s running on my computer
The elapsed time
onThere is a clear gap
.PaddleOCR ran 141 pictures
.It took just 6.9 seconds
.The average time is 0.048s
, consistent with the official time consumption data; whileEasyOCR ran 121 images
butWith 40.24 seconds
.The average time is 0.33s
. - Why does EasyOCR use 121 images? I deleted 20.gif images because it did not support.gif image recognition.
Here’s the official estimate:
On GPU T4, the mobile terminal model only needs 137ms, and the processing delay on Snapdragon 855 mobile terminal is only about 300ms.
- It’s running on my computer
(4) Precision comparison
Of course, to evaluate whether a model can be used, it must depend on accuracy, if the identification is not accurate, it is certainly not applicable. Take the picture below for example.
(1) PaddleOCR:
PaddleOCR identifies 53 horizontal and vertical words (which, as you can see, includes all of the text in the image), of which there are only about five wrong words, three with a confidence level of 0.6 or less, and most with a confidence level of 0.8 or more.
(2) EasyOCR:
EasyOCR recognized 63 paragraphs of text (without the picture frame, it is not clear whether all the text was recognized), which is still a certain distance from paddleocr in terms of recognition content and accuracy.
( 'Redaction China WWW. REDOCN.COM'.0.14594213664531708)
( 'Hey,'.0.9193199872970581)
('The happxienffgefgiga'.0.000323598796967417)
( 'nian le fan tian'.0.537309467792511)
( 'look, theres'.0.22078940272331238)
('alot ofwild flowers'.0.057002753019332886)
('I'm happy because you're happy.'.0.531562089920044)
('Happy childhood, happy childhood.'.0.8216490149497986)
('- Wild Bird Green '.0.004080250393599272)
('on the lawn.'.0.44294288754463196)
('fragrance'.0.9942429661750793)
('.RE'.0.18314962089061737)
( 'Thats'.0.7607358694076538)
('my'.0.6383113861083984)
('frend'.0.45365384221076965)
('FOREVER'.0.7280043363571167)
('The Childhood Party.'.0.7070863246917725)
Copy the code
-
(3) chineseocr_lite:
The model recognized 35 paragraphs, which were generally similar to paddle, but there was a big difference in confidence. Chineseocr_lite had the highest confidence of 0.59, and most of them were between 0.3 and 0.5.
(5) Multi-angle comparison
For OCR-oriented developers, the open source REPO is most attractive:
- ① High-quality pre-training model;
- ② Simple and easy to use training code;
- ③ Easy to use without pit deployment ability.
A quick comparison of the core capabilities of the current mainstream OCR-oriented open source REPO
Languages, | Pretrain model size | F1-Score | End side deployment | Custom training | Support for PIP installation | |
---|---|---|---|---|---|---|
chineseocr_lite | In both Chinese and English | 4.7 M | 0.3899 | support | Does not support | Does not support |
easyOCR | multilingual | 218M | 0.2214 | Does not support | Does not support | support |
PaddleOCR | multilingual | 3.5 M | 0.521 | support | support | support |
- In terms of languages, Chineseocr_lite only supports Chinese and English. EasyOCR has the advantage of multi-language support, which is very suitable for developers with small language needs. However, PaddleOCR also supports more and more languages, including Chinese and English, French, German, Korean and Japanese.
- Based on the pre-training model,
EasyOCR currently has no ultra-lightweight model
.Chineseocr_lite's latest model is around 4.7m
.PaddleOCR offers 3.5m
isThe lightest known in the industry
; - On the deployment side,
The large easyOCR model is not suitable for end-to-end deployment
.Chineseocr_lite and PaddleOCR are relatively small
, both have end-to-end deployment capabilities, and currentlyPaddleOCR already has a mobile APP
; - For custom training, actual business scenarios, the pre-training model often cannot meet the needs for custom training and model fine-tuning, but at present
Only PaddleOCR supports this
; - In terms of performance indicators, 300 images collected for actual OCR application scenarios, including contracts, license plates, nameplates, train tickets, test sheets, forms, certificates, street view text, business cards, digital display screens, etc., each image has an average of 17 text boxes. PaddleOCR’s F1-score is over 0.5, which is quite good.
(6) Other analysis
As we know, the consistency of training and test data directly affects the model effect. In order to better the model effect, we often need to use our own data to train the ultra-lightweight model. In addition to the 3.5m ultra-lightweight model, PaddleOCR also provides two text detection algorithms and four text recognition algorithms, and publishes corresponding four text detection models and eight text recognition models, on which users can build their own ultra-lightweight models.
PaddleOCR has opened source several well-known text detection and recognition algorithms, all of which match or exceed the original. On ICDAR2015 open data set of text detection, the algorithm has the following effects:
model | Backbone network | precision | recall | Hmean | Download link |
---|---|---|---|---|---|
EAST | ResNet50_vd | 88.18% | 85.51% | 86.82% | Download link |
EAST | MobileNetV3 | 81.67% | 79.83% | 80.74% | Download link |
DB | ResNet50_vd | 83.79% | 80.65% | 82.19% | Download link |
DB | MobileNetV3 | 75.92% | 73.18% | 74.53% | Download link |
SAST | ResNet50_vd | 92.18% | 82.96% | 87.33% | Download link |
The algorithm also performs surprisingly well on the total-text public data set.
In the text recognition algorithm part, four text recognition algorithms of CRNN, Rosseta, STAR-Net and RARE are realized by referring to the text recognition training and evaluation process of DTRB[3], covering the mainstream two types of text recognition algorithms based on CTC and Attention. The algorithm was trained using MJSynth and SynthText and evaluated on IIIT, SVT, IC03, IC13, IC15, SVTP and CUTE data sets. The algorithm results were as follows:
model | Backbone network | Avg Accuracy | Model store naming | Download link |
---|---|---|---|---|
Rosetta | Resnet34_vd | 80.24% | rec_r34_vd_none_none_ctc | Download link |
Rosetta | MobileNetV3 | 78.16% | rec_mv3_none_none_ctc | Download link |
CRNN | Resnet34_vd | 82.20% | rec_r34_vd_none_bilstm_ctc | Download link |
CRNN | MobileNetV3 | 79.37% | rec_mv3_none_bilstm_ctc | Download link |
STAR-Net | Resnet34_vd | 83.93% | rec_r34_vd_tps_bilstm_ctc | Download link |
STAR-Net | MobileNetV3 | 81.56% | rec_mv3_tps_bilstm_ctc | Download link |
RARE | Resnet34_vd | 84.90% | rec_r34_vd_tps_bilstm_attn | Download link |
RARE | MobileNetV3 | 83.32% | rec_mv3_tps_bilstm_attn | Download link |
SRN | Resnet50_vd_fpn | 88.33% | rec_r50fpn_vd_none_srn | Download link |
Using LSVT street View data set, crop out 30W data according to truth value for position calibration. In addition, 500W synthetic data is generated based on LSVT corpus to train Chinese models. Relevant configuration and pre-training files are as follows:
model | Backbone network | The configuration file | Pretraining model |
---|---|---|---|
Super lightweight Chinese model | MobileNetV3 | rec_chinese_lite_train.yml | Download link |
General Chinese OCR model | Resnet34_vd | rec_chinese_common_train.yml | Download link |
How do you get the results? Refer to the PaddleOCR documentation for text recognition in model training/evaluation
Six, summarized
PaddleOCR sums up a few points:
- Small volume
- Run fast
- The deployment of convenient
- Using a simple
- The performance is very good
Through the comparison of various dimensions, we still decide to use PaddleOCR as the model for us to participate in the competition. Now the development is almost complete, you can keep following me. After we participate in the competition, all the specific codes can be published, which is also convenient for everyone to learn.
You can also take a look at the ranking of open source projects from github.com/trending and paperswithcode.com/. The ranking is also very good, indicating that there are a lot of people concerned about this, indicating that there are a lot of user groups.
Of course, it is still in the competition stage, the other is not convenient to open, if you want to source code and other information, I can provide after the game, you can leave a message email, will be sent to everyone after the game.
GitHub open Source: github.com/PaddlePaddl…
Personally, I suggest a Star for domestic open source projects. If you like it, you can also Fork it. In this way, I think they will be more motivated to continue to create and innovate.
Hey hey, if the Star after, you can find me to take my competition source code. Give PaddleOCR a boost.
Hey hey, secretly put an official technical exchange group, hiss don't tell others, use micro channel scan the following two-dimensional code, you can join.