Tesseract-ocr image recognition plugin node-TESr

preface

The project was born in a crawler event, when on a whim, want to have a certain rent information to climb down the front of very smoothly, but on the rent price information is a, the price of houses with background information for a digital image for the offset to display, and Sprite figure the same way, of course, which added a little algorithm, specific as follows.

Get digital picture information andoffsetinformation
- { "offset": [ [1, 4, 2, 8], [5, 1, 7, 8], [5, 1, 3, 8], ... ] }
It’s the offset information plus a little bit of arithmeticpositioninformation
- (BACKground-position: XXX px)
With a digital image in the background, offset,appendTo price information he should be in place

On second thought, it’s not a big deal, just add a recognition process and an algorithm.

In the process of implementing image recognition, Google’s open source software TesserACt-OCR is used. Because the crawler environment is Node, a node plug-in suitable for tesseract-OCR of the latest version is written, and the function of command execution is added later.

demo

Command line — 1

Command line use — 2

Module usage — 1

The project is here

If you think I can help you, you can give me a star, Crab

github node-tesr

The body of the

Command execution

To use image recognition, make sure tesserACt-OCR is installed on your computer and click Download.

Want to use the command line to suggest a global installation

npm install node-tesr -g
Copy the code

tesr --from=./test/output.jpg --to=./output.txt
Copy the code

Parameters that

--from image path to identify (required) --to Identify text in this file (not required, by default the identified content is output to the command line) --l Identify language, with a little processing for Chinese, identify simplified --l= CHS, identify traditional --l= CHT (not required, Default eng) --p see instructions in lib/config.js (optional, default 3 automatic mode) --o See instructions in lib/config.js (optional, default 3 automatic mode) --o See instructions in lib/config.js (optional, default 3 automatic mode)Copy the code

Module introduction

npm install node-tesr
Copy the code

const tesseract = require('node-tesr')

tesseract('./output.jpg', { l: 'eng'.oem: 3.psm: 3 }, function(err, data) {
  // Get the recognition content here
  console.log(data)
})

// Or as follows
tesseract('./output.jpg'.function(err, data) {
  // Get the recognition content here
  console.log(data)
})
Copy the code

After the language

The effect

Tesseract-ocr does not seem to be able to solve the transparent bottom, so we need to combine the images node plugin with tesseract-OCR

let images = require('images')
images(500.100)
  .fill(0xff.0xff.0xff.1)
  .draw(images('demo.png'), 10.10)
  .save('output.jpg', {
    quality: 100
  })
Copy the code

Fill the transparent bottom with a white bottom to be recognized normally

How to improve my image recognition accuracy

The boss! My image recognition rate is very low how to break!

Here, look at this. This will improve image recognition.

Recognition algorithm learning

to-do

Added the ability to identify network address images
usethenTo handle callbacks

The footer

Code is life, and I love it.

Technology is constantly changing. Brains are always online. We’ll see you next time

By — Crotch Trio

I am here gayhub@jsjzh welcome to find me to play.

Welcome friends to join me directly, pull you into the group to do things, remember to note where you read the article.

Ps: If the picture is invalid, you can add my wechat: Kimimi_king