preface

“Not your cheek in the wind

Tears are beautiful enough to harmonize

Can’t wait for the rain to fall

My tears are perceived by you”

Listening to looping songs, writing about long-lost bugs. All right, one more day. Just a small partner said, want to do a tool station to play. I randomly found a tool station, looked, found that many have text OCR recognition function. So, I think back to the very popular open source OCR god-level project I knew earlier, Tesseract OCR.

A simple introduction

The website is shown below

tesseract-ocr.github.io/

Short and clear, the site that hangs on Github.

Details no longer introduced, interested, can enter the gay website: github.com/tesseract-o… , watch and learn.

Speaking to

To use it in development, you still need to access the corresponding API.

For developers, numerous wrappers are provided to implement Api calls.

For a small Java developer, tess4J is still used as an Api. The website is as follows:

tess4j.sourceforge.net/

You can download the JAR directly, or use Maven dependencies.

<! -- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j --> <dependency> < the groupId >.net. Sourceforge. Tess4j < / groupId > < artifactId > tess4j < / artifactId > < version > 4.5.3 < / version > < / dependency >Copy the code

The development of implementation

First Creation Project

Second Add dependency
<? The XML version = "1.0" encoding = "utf-8"? > < project XMLNS = "http://maven.apache.org/POM/4.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > The < modelVersion > 4.0.0 < / modelVersion > < groupId > org. Example < / groupId > < artifactId > test - textocr < / artifactId > < version > 1.0 - the SNAPSHOT < / version > < dependencies > <! -- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j --> <dependency> < the groupId >.net. Sourceforge. Tess4j < / groupId > < artifactId > tess4j < / artifactId > < version > 4.5.3 < / version > < / dependency > </dependencies> </project>Copy the code
Third fill in the class file
package ocr;

import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.IOException;


/** * OCR test@authorGo against huc_ *@since 2021/1/12 17:42
*/
public class TestTextOcr {

   public static void main(String[] args) throws IOException {
        // Create an instance
       ITesseract instance = new Tesseract();

        // Set the recognition language

       instance.setLanguage("chi_sim");

        // Set the recognition engine

       instance.setOcrEngineMode(1);

        // Read the file

       BufferedImage image = ImageIO.read(TestTextOcr.class.getResourceAsStream("/2.jpg"));
       try {

            / / recognition

           String result = instance.doOCR(image);
           System.out.println(result);
      } catch(TesseractException e) { System.err.println(e.getMessage()); }}}Copy the code

Fifth Adds the training locale configuration

TESSDATA_PREFIX=F:\tessdata, variable name, fixed, value is the website download file github.com/tesseract-o…

Sixth run

Here are the results:

Maybe recognize the pattern, not quite right, so let’s switch

instance.setOcrEngineMode(0);
Copy the code

Is that more comfortable? Haha. Recognition rates rise instantly.

You can test it yourself.

conclusion

Well, that’s all for today. Technology is about juggling. Learn more, arm yourself, and be strong.