preface
The project was born in a crawler event, when on a whim, want to have a certain rent information to climb down the front of very smoothly, but on the rent price information is a, the price of houses with background information for a digital image for the offset to display, and Sprite figure the same way, of course, which added a little algorithm, specific as follows.
- Get digital picture information and
offset
information{ "offset": [ [1, 4, 2, 8], [5, 1, 7, 8], [5, 1, 3, 8], ... ] }
- It’s the offset information plus a little bit of arithmetic
position
information(BACKground-position: XXX px)
- With a digital image in the background, offset,
append
To price information he should be in place
On second thought, it’s not a big deal, just add a recognition process and an algorithm.
In the process of implementing image recognition, Google’s open source software TesserACt-OCR is used. Because the crawler environment is Node, a node plug-in suitable for tesseract-OCR of the latest version is written, and the function of command execution is added later.
demo
Command line — 1
Command line use — 2
Module usage — 1
The project is here
If you think I can help you, you can give me a star, Crab
github node-tesr
The body of the
Command execution
To use image recognition, make sure tesserACt-OCR is installed on your computer and click Download.
Want to use the command line to suggest a global installation
npm install node-tesr -g
Copy the code
tesr --from=./test/output.jpg --to=./output.txt
Copy the code
Parameters that
--from image path to identify (required) --to Identify text in this file (not required, by default the identified content is output to the command line) --l Identify language, with a little processing for Chinese, identify simplified --l= CHS, identify traditional --l= CHT (not required, Default eng) --p see instructions in lib/config.js (optional, default 3 automatic mode) --o See instructions in lib/config.js (optional, default 3 automatic mode) --o See instructions in lib/config.js (optional, default 3 automatic mode)Copy the code
Module introduction
npm install node-tesr
Copy the code
const tesseract = require('node-tesr')
tesseract('./output.jpg', { l: 'eng'.oem: 3.psm: 3 }, function(err, data) {
// Get the recognition content here
console.log(data)
})
// Or as follows
tesseract('./output.jpg'.function(err, data) {
// Get the recognition content here
console.log(data)
})
Copy the code
After the language
The effect
Tesseract-ocr does not seem to be able to solve the transparent bottom, so we need to combine the images node plugin with tesseract-OCR
let images = require('images')
images(500.100)
.fill(0xff.0xff.0xff.1)
.draw(images('demo.png'), 10.10)
.save('output.jpg', {
quality: 100
})
Copy the code
Fill the transparent bottom with a white bottom to be recognized normally
How to improve my image recognition accuracy
The boss! My image recognition rate is very low how to break!
Here, look at this. This will improve image recognition.
Recognition algorithm learning
to-do
- Added the ability to identify network address images
- use
then
To handle callbacks
The footer
Code is life, and I love it.
Technology is constantly changing. Brains are always online. We’ll see you next time
By — Crotch Trio
I am here gayhub@jsjzh welcome to find me to play.
Welcome friends to join me directly, pull you into the group to do things, remember to note where you read the article.
Ps: If the picture is invalid, you can add my wechat: Kimimi_king