In recent years, students have acquired a new learning tool — taking photos to search for questions. When you encounter a problem that you can’t do, you just need to take a picture of the problem, and the detailed solution and answer of the problem will appear in your phone.
The dark technology behind the “photo search” is optical character recognition, or OCR. OCR is the process by which an electronic device, such as a scanner or camera, examines a printed character on paper, determines its shape by detecting dark and light patterns, and then uses character recognition to translate the shape into computer text.
The dark technology behind “photo search” is optical character recognition (OCR)
OCR has a wide range of applications. OCR was used in the earliest known “where not to click” reading machines. The reading pen is equipped with a camera to scan the text. When the pen comes into contact with the book, the text can be recognized and extracted.
As a commonly used office software, scanning Almighty Wang can achieve the “picture to text” function. Software can recognize text information in various file types, which is a very typical application of OCR technology.
Scanning Omni-King’s “picture to text” function is a typical OCR technology application
Currently, text recognition falls into the following categories:
· General word recognition: generally refers to the recognition of irregular documents such as PDF.
· Card identification: including ID card, bank card, business license, business card, passport, Hong Kong and Macao pass, household registration book, driving license, driving license, etc.
· Identification of bills: including value-added tax invoices, fixed invoices, train tickets, taxi tickets, itinerary, insurance policies, bank documents, etc.
· Others: such as license plate, vehicle certificate, seal test, etc.
With the continuous expansion of classification, the application scenarios of OCR technology are becoming more and more extensive. The following are a few mature fields of application:
· Remote identity authentication: combined with OCR and face recognition technology, automatic entry of user ID information is realized and user identity verification is completed. It is applied to finance, insurance, social security, O2O and other industries to effectively control business risks.
· Content review and supervision: automatic recognition of text content in pictures and videos, timely detection of pornographic, violent-related, politically sensitive, malicious advertising and other non-compliant content, avoid business risks, and greatly save the cost of manual review.
· Paper documents and bills are electronic: OCR can realize automatic recognition and input of paper documents, bills and forms, reduce manual input costs and improve input efficiency.
Image from the Internet
For the above scenarios, the amount of training data greatly affects the technical effect of deep learning-based technologies.
As an AI data service provider deeply engaged in the field of AI data for nearly 10 years, Datantang has been committed to providing professional data services for global AI enterprises.
Relying on its own data advantages and rich experience in data processing, Dataston has launched a series of OCR annotation and transliteration data, to provide assistance for the implementation of related technologies more widely.
14,980 PPT OCR data in 8 languages
PPT OCR data samples of 8 languages
The data covers 8 languages, multiple scenes, different shooting angles, different shooting distances, and different lighting conditions. In the aspect of annotation, the quadrangle box of line level text is annotated, and the line level text is transliterated.
The vertex deviation of quadrilateral frame is not more than five pixels. The accuracy of the detection box is not less than 95%, and the accuracy of text transliteration is not less than 95%. This data can be used for multi-language OCR tasks.
OCR annotation and transliteration data of 105,974 natural scenes in 12 languages
Data examples of OCR annotation and transliteration of natural scenes in 12 languages
Data are available in 12 languages (6 Asian and 6 European), including store plaques, station signs, posters, tickets, road signs, cartoons, manhole covers, reminders, warnings, packaging instructions, menus, building signs and other natural scenes.
In terms of annotation, the quadrilateral box marked with line-level text and the line-level text was transliterated. The vertex deviation of the quadrilateral box was not more than five pixels, and the accuracy of the detection box was not less than 97%, and the accuracy of text transliteration was not less than 97%.
3,506 OCR notes and transcripts in Hindi
A sample of OCR annotation and transliteration in Hindi
The data included 2,056 natural scene images, 1,103 Internet images and 347 text images. In the aspect of annotation, line-level content is marked with line-level quadrangle box and line-level content is transliterated. The vertical column content is marked with the vertical column quadrilateral box, and the vertical column content is transliterated. The data can be used for tasks such as Hindi recognition, Hindi photo translation in multiple scenes.
4,995 OCR annotated and transliterated data in Vietnamese
Vietnamese OCR annotation and transliteration data sample
The data included 258 natural scene images, 2,553 Internet images and 2,184 text images. In the aspect of annotation, line-level content is marked with line-level quadrangle box and line-level content is transliterated. The vertical column content is marked with the vertical column quadrilateral box, and the vertical column content is transliterated. The data can be used for Vietnamese language recognition, Vietnamese photo translation and other tasks in various scenarios.
Compared with object detection and recognition, OCR data annotation has its particularity and higher cost due to the fact that OCR contains tilted text box, low-resolution text and diversified text layout.
The PRO annotation platform supports privatization and deployment, which can help enterprises quickly and safely annotate artificial intelligence data and provide more professional,