Ali OCR- Image text recognition and understanding

Abstract: 2018 Computing Conference Shanghai Summit, Alibaba image senior algorithm expert Wang Yongpan interpretation of Ali OCR- character recognition technology and application case sharing. The essence of OCR is to recognize the text in the picture, that is, to recognize and extract the required target text in the complex picture background. This paper mainly introduces OCR, OCR algorithm, optical reading products and their applications. Show the powerful application of Ali OCR in word recognition. Dozens of ari cloud products for a limited time discount, click here to get the coupons to start the practice on the cloud live video please click

What is the OCR?

The initial stage of OCR is to recognize the text in the image, that is, to find the position of the text, and then identify the word. Based on the original character recognition, the text understanding is an application of character recognition. Output the user’s desired results based on understanding of spatial relationships and semantics. According to the data types that need to be processed, it can be divided into four data scenarios: digital native, document, photo form and natural scene. The differences between these four texts are quite large, and the processing methods of each text are also very different.

OCR algorithms

Character location and character recognition are two basic algorithms of OCR. The recognition accuracy and universality determine the universality and simplicity of OCR in various application scenarios. As the application evolved, it expanded to include text comprehension. On the basis of text recognition and text location, table extraction, seal extraction, etc., and then combined with semantic understanding to build the field space and semantic relations, finally realize the process of text understanding. OCR has two core algorithmic capabilities — universal literal and universal structuring. General character recognition is to identify the position of the text in the picture and identify the content of the text. On the basis of character recognition, combining with the actual needs of users, structuring is general structuring.


Text localization

The goal of text localization is to locate the position of text in the image and represent it into a line. With the development of deep learning, it has been able to better solve feature problem points.






Character recognition

Character recognition is based on character location, recognition of text content at the same time, output word location and recognition for text understanding. Word recognition includes classification and sequence. Classification is the extraction of fine features. Sequence is analyzed from human cognition. Two difficult problems in character recognition are similar characters and rare characters. Similar word recognition is an academic problem. At present, it is found that there are more than two thousand shaped words in total. CRN conducts a separate recognition test on these two thousand words, and the recognition rate only reaches 83%. Finally, it is found that the low recognition rate is due to the deviation caused by the difference that SoftMax cannot effectively represent. There are about 3700 Chinese characters in common use, covering 99% of written materials, but there are about 21,303 names and place names, including a large number of rare characters, and names and place names are very important in our practical application. CRN’s tests found a recognition rate of just 21 percent. If the sample size is too small, sufficient training cannot be carried out, and the recognition rate is difficult to be improved.






General structure





OCR products

Ali OCR is a technology. From technology to product, it needs to be scaled up and realized from three aspects of universality, efficiency and function.

Document OCR






The form of OCR







b77241c7fa24c0a.png)