A, takeaway

OCR engineers must know about this OCR open source project: PaddleOCR

In just half a year, the cumulative number of Star has exceeded 11.5K, frequently ranking first in Github Trending and Paperswithcode daily and monthly lists, and rated as China’s Github Top20 active projects in Github 2020 Digital Insight Report.

To call it the most popular REPO in OCR right now is definitely an exaggeration.

Recently, it brought two new releases:

  • PGNet: Real-time goal-shaped Text Spotting with Point Gathering Network is a simple and efficient end-to-end Text recognition model. Compared with the previous SOTA algorithm, the prediction speed is twice as fast after the propeller prediction engine acceleration and post-processing optimization.
  • The variety of multi-language support has been increased to 80+ : it basically covers the international mainstream languages. In the evaluation of open source test set MLT2017, the recognition effect of Chinese, Korean, Japanese, Latin and Arabic languages is significantly better than that of EasyOCR and open source SOTA.

PaddleOCR historical performance review

Take a look at PaddleOCR’s performance on GitHub in just a few months since it opened in June last year:

  • In June 2020, the 8.6m ultra-lightweight model was released, and GitHub Trending ranked first in the global Trend list daily.
  • In August 2020, open source CVPR2020 top algorithm, and then on the GitHub trend list!
  • In October 2020, published PP-OCR algorithm, open source 3.5m super lightweight model, and ranked first in the Trend list of Paperswithcode again!
  • In January 2021, released style-text Text synthesis algorithm and PPOCRLabel data annotation tool, and the number of STAR exceeded 10000+, up to now has reached 11.5K. In the Github 2020 digital insight report, it was rated as one of the Top20 active Github projects in China.

GitHub developers naturally understand this

  • Effects of ultra-lightweight models

Train tickets, forms, metal nameplates, flipped pictures, foreign languages are all fine.

Dynamic graph and static graph are two modes commonly used in deep learning framework. In dynamic graph mode, the code is written and run in a way that is familiar to Python programmers and easy to debug. However, in terms of performance, Python is expensive to execute and lags behind C++.

Static diagrams have a performance advantage over dynamic diagrams in terms of deployment. When the static graph program is compiled and executed, the pre-built neural network can be separated from Python and re-parsed and executed in C++. Moreover, it has the overall network structure and can also carry out some network structure optimization.

The function of turning dynamic graph to static graph is added in the flying blade dynamic graph, which supports users to write network code using dynamic graph. During deployment prediction, flyblade will analyze the user code and automatically convert it into static graph network structure, which gives consideration to the advantages of ease of use of dynamic graph and deployment performance of static graph.

  • Text Compositing tool style-text effect

Compared with the traditional data synthesis algorithm, style-text can realize the image Style transfer under special background. Only a few target scene images are needed to synthesize a large amount of data, and the effect is shown as follows:

  • Semi-automatic labeling tool PPOCRLabel

Through the built-in high-quality PPOCR English and Chinese ultra-lightweight pre-training model, the EFFICIENT annotation of OCR data can be realized. The CPU machine is perfectly fine to run. The effect is shown as follows:

Usage is also very simple, annotation efficiency increased by 60-80% is no problem.

Portal: Github: github.com/PaddlePaddl…

So what are the surprises in store for the April 2021 update?

AAAI 2021 Top conference paper: End-to-end SOTA algorithm PGNet

The performance of the PGNet algorithm in the total-text data set and the end-to-end performance. On the basis of comparable accuracy, compared with the previous SOTA algorithm, the prediction speed is doubled after the acceleration of the flying propeller prediction engine and post-processing optimization.

▲ Figure 1: Comparison of speed and accuracy performance of PGNet model

Detailed data indicators:

▲ Table 1: Detection and end-to-end performance on ICDAR2015 dataset

The method framework proposed by PGNet is shown in the figure below. The input image is obtained by Backbone network to obtain 1/4 of the sub-sampling feature map. Through multi-task learning, the content of four tasks are simultaneously regressive, including text edge offset prediction (TBO), text center line prediction (TCL), Text orientation offset prediction (TDO) and text character Classification graph prediction (TCC). The text line detection result is obtained by TBO and TCL post-processing, and the text line recognition result is obtained by TCL, TDO and TCC output.

▲ Figure 2: Network flow framework

The model effect can be seen on ICDAR2015 and total-text data sets:

▲ Figure 3 Total Text and ICDAR2015 data set visualization renderings

PGNet paper address: www.aaai.org/AAAI21Paper…

At the same time, the seal recognition ability developed based on PGNet has been opened on baidu AI open platform, which can effectively detect and identify seals in contract documents or common bills, output text content, seal position information and related confidence. Common seals such as round, oval, and square seals are supported. Provides standardized API interfaces for rapid integration, and supports private deployment to local storage to ensure service data privacy.

Open capacity address: ai.baidu.com/tech/ocr/se… Note: The model is not open source, but you can apply for a free trial.

4. Rich multi-language support, currently supports global 80+ language models

A brief comparison of the core capabilities of the current mainstream OCR direction open source REPO: Chinese and English model performance and function comparison

Among them, performance and function (F1-Score) comparison of some multi-language models (only provided by EasyOCR)

Model effect

It is worth mentioning that PaddleOCR has been extensively covered by 80+ major languages in PaddleOCR by developers around the world who have provided dictionaries and corpus in multiple languages through PR or issue: Including Chinese simplified, Chinese traditional, English, French, German, Korean, Japanese, Italian, Spanish, Portuguese, Russian, Arabic, Hindi, uyghur, Persian, Urdu, Serbia (Latin), o ‘shea temple, marathi, Nepal wen wen, Bulgaria, Serbia, Ukraine, belarus , Telugan, Karadavan, Tamil, also welcome more developers can participate in the construction.

Five, conscience produced Chinese and English document tutorial

Don’t need to say more, you visit GitHub click star after their own experience: github.com/PaddlePaddl…

Official website: www.paddlepaddle.org.cn

PaddleOCR

Making: github.com/PaddlePaddl…

Gitee: gitee.com/paddlepaddl…

PGNet paper address: www.aaai.org/AAAI21Paper…

Seal recognition development ability: ai.baidu.com/tech/ocr/se…