Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article was published in cloud + Community column by Tencent Cloud AI Center

Abstract: In daily life and work, we will inevitably encounter some problems, such as their hard work to write the data, finally printed out but found the source file lost; Collected some business cards, but to enter information one by one, very troublesome; The business of express delivery companies is getting better and better, but it takes a lot of time to register and input waybills every day, which is very inefficient.

So is there any technology that can help us solve these problems? Yes, it is OCR word recognition technology. Today we invited AI scientist Ji Yongnan, product manager Florali, Chen Yingtian and senior engineer Xiao Xihua from Tencent Cloud Big data AI Product Center to share with us Tencent cloud’s exploration in this field in recent years.

Front show: let you see the world clearly, really

Wen ︱ Flora

What is OCR?

OCR is real-time and efficient positioning and recognition of all text information in the picture, return the text box position and text content. Support multi-scene, any page under the whole picture text recognition, as well as English, letters, numbers recognition. To put it simply, the text content on the picture is intelligently recognized as editable text, such as:

What is the technical principle of OCR?

OCR is essentially image recognition. The principle is basically the same as other image recognition problems. It contains two key technologies: text detection and text recognition. Firstly, the features in the image are extracted and the target region is detected, and then the characters in the target region are segmented and classified.

In depth study of time for dividing point, until nearly five years ago, the industry’s most widely used is still the traditional framework of OCR recognition technology, and with the rise of deep learning, based on the technology of OCR recognition framework to another kind of new train of thought quickly broke through the original technical bottlenecks (such as text localization, binarization and text segmentation, etc.), And has been widely used in industry.

Firstly, text positioning, then slant text correction, then word segmentation, word recognition, and finally based on statistical models (such as hidden Markov chain, HMM) semantic error correction.

What are the difficulties of OCR technology?

Complex backgrounds, artistic fonts, low resolution, uneven lighting, image degradation, character deformation, multi-language mixing, complex text layout, detection box character fragmentation, etc.

How to overcome these difficulties?

Start with a couple of things. One is the use of the scene, on the other hand, from the technical improvement. Tencent Yutu Laboratory has made in-depth optimization on text detection technology and proposed Compact Inception, which aims to improve text detection/extraction capability at all scales by designing a reasonable network structure. At the same time, RNN multi-layer adaptive network and Refinement structure are introduced to improve the integrity and accuracy of detection.

What functions does Tencent Cloud OCR currently support?

Based on Tencent Youtu lab’s world-leading deep learning technology, we have supported: ID card recognition, bank card recognition, business card recognition, business license recognition, driving license recognition, license plate number recognition, general printing recognition, handwriting recognition.

General printing technical difficulties, use scenarios

We know id card identification can be widely used in the financial sector, in the identity authentication, can reduce the user’s information input, improve efficiency, improve the user experience, business license recognition completely get rid of the cumbersome manual entry, can also save a lot of human resources for the enterprise cost, everyone is familiar with the scene.

For general printing, Tencent Youtu Laboratory independently designed a full range of multi-scale text recognition engine, which can overcome the problems of blur, defocus, perspective, and partial occlusion of text. The recognition accuracy is up to 90%, which is in the leading level in the industry. It can be used in a wide range of scenarios, such as text recognition of images on any layout, and can be widely used in printing documents, advertising pictures, medical treatment, logistics and other industries.

Are there any good examples of universal printing?

This AD, for example, has multiple contents, multiple fonts, mixed Chinese and English with numbers, and a casual background. Our OCR can greatly restore the image authenticity and greatly improve the accuracy of the algorithm through perspective correction and blur removal.

Another example is to identify text dense, small line spacing, perspective distortion and other posters. Manual identification takes time and is difficult to make with the naked eye. However, Tencent Cloud OCR designed a small and precise feature extraction network, with advanced pre-processing technology, recognition accuracy of more than 93%.

Sometimes the recognition rate is not ideal. How can we improve the recognition accuracy?

First, the current scene will be confirmed, causing the low accuracy of the reasons. Evaluate the space design that can be improved, then make changes accordingly, including pre-processing, etc.

Is there any case about Tencent cloud handwriting recognition?

Tencent is the first service provider in China to apply handwriting recognition in complex scenes, with a recognition accuracy of over 90% for numbers, within 15ms for single characters, and over 80% for complex Chinese characters.

Tencent cloud handwritten OCR has been applied to waybill identification scenarios, which solves the problems such as the huge workload of manual input of daily express bills in the logistics industry, which is prone to error and very inefficient.

What is the difference between waybill identification and traditional manual identification?

If the traditional manual identification is based on 3min/ order, 6.25 people/day are needed for 1000 orders, and a lot of manpower is needed to ensure the timeliness of the waybill. Considering the labor cost, the timeliness of the waybill will be affected, and it is difficult to achieve both cost and service.

Our waybill identification speed can reach millisecond level/order, and support 24-hour identification service, business growth only need to invest in computing server resources, large flexibility.

Compared with traditional identification, not only the cost can be reduced, the accuracy can be improved, but also the risk of privacy disclosure can be protected.

At present, OCR application reality has a wide range of application scenarios, Tencent cloud OCR has what advantages?

Our OCR character recognition technology currently supports a total of 10,000 + labels in simplified and traditional Chinese, English, numbers and punctuation, covering hundreds of fonts, and 2W+ labels in rare characters.

So we have a lot of landed clients in the industry, right?

The new version of hand Q uses our technology and supports the function of extracting text from pictures in the three entrances of scan, chat window and space picture preview.

It is convenient for users to read, edit and save the text on the picture, so that they can translate and search the text extracted. In a variety of scenarios, it can greatly improve the user’s reading and recording efficiency of the text on the picture.

OCR technology is also used to identify business cards in enterprise wechat. The user only needs to take a picture or select the picture of the business card, the text in the business card can be accurately and quickly identified, and automatically extracted into the corresponding field, greatly simplifying the business card entry process, but also avoid manual entry process may occur errors.

Question and answer

What are the formatting requirements for character recognition?

reading

AI landing practice in multiple scenarios

Some details and reflections on “Guess painting song”

How is the extreme Crash rate below 0.01% achieved?

Cloud, college courses, recommend | zhihu KOL, choose to share with you in the machine learning how to do

This article has been authorized by the author to Tencent Cloud + community, more original text pleaseClick on the

Search concern public number “cloud plus community”, the first time to obtain technical dry goods, after concern reply 1024 send you a technical course gift package!

Massive technical practice experience, all in the cloud plus community!