preface
“Not your cheek in the wind
Tears are beautiful enough to harmonize
Can’t wait for the rain to fall
My tears are perceived by you”
Listening to looping songs, writing about long-lost bugs. All right, one more day. Just a small partner said, want to do a tool station to play. I randomly found a tool station, looked, found that many have text OCR recognition function. So, I think back to the very popular open source OCR god-level project I knew earlier, Tesseract OCR.
A simple introduction
The website is shown below
tesseract-ocr.github.io/
Short and clear, the site that hangs on Github.
Details no longer introduced, interested, can enter the gay website: github.com/tesseract-o… , watch and learn.
Speaking to
To use it in development, you still need to access the corresponding API.
For developers, numerous wrappers are provided to implement Api calls.
For a small Java developer, tess4J is still used as an Api. The website is as follows:
tess4j.sourceforge.net/
You can download the JAR directly, or use Maven dependencies.
<! -- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j --> <dependency> < the groupId >.net. Sourceforge. Tess4j < / groupId > < artifactId > tess4j < / artifactId > < version > 4.5.3 < / version > < / dependency >Copy the code
The development of implementation
First Creation Project
Second Add dependency
<? The XML version = "1.0" encoding = "utf-8"? > < project XMLNS = "http://maven.apache.org/POM/4.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > The < modelVersion > 4.0.0 < / modelVersion > < groupId > org. Example < / groupId > < artifactId > test - textocr < / artifactId > < version > 1.0 - the SNAPSHOT < / version > < dependencies > <! -- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j --> <dependency> < the groupId >.net. Sourceforge. Tess4j < / groupId > < artifactId > tess4j < / artifactId > < version > 4.5.3 < / version > < / dependency > </dependencies> </project>Copy the code
Third fill in the class file
package ocr;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.IOException;
/** * OCR test@authorGo against huc_ *@since 2021/1/12 17:42
*/
public class TestTextOcr {
public static void main(String[] args) throws IOException {
// Create an instance
ITesseract instance = new Tesseract();
// Set the recognition language
instance.setLanguage("chi_sim");
// Set the recognition engine
instance.setOcrEngineMode(1);
// Read the file
BufferedImage image = ImageIO.read(TestTextOcr.class.getResourceAsStream("/2.jpg"));
try {
/ / recognition
String result = instance.doOCR(image);
System.out.println(result);
} catch(TesseractException e) { System.err.println(e.getMessage()); }}}Copy the code
Fifth Adds the training locale configuration
TESSDATA_PREFIX=F:\tessdata, variable name, fixed, value is the website download file github.com/tesseract-o…
Sixth run
Here are the results:
Maybe recognize the pattern, not quite right, so let’s switch
instance.setOcrEngineMode(0);
Copy the code
Is that more comfortable? Haha. Recognition rates rise instantly.
You can test it yourself.
conclusion
Well, that’s all for today. Technology is about juggling. Learn more, arm yourself, and be strong.