Introduction: In this tutorial, you will learn how to use Tesseract to read and process text extracted from images using OCR.
Xcode 10.2, Swift 5, iOS 12.1 and TesseractOCRiOS (5.0.1)
OCR is the process of electronically extracting text from an image. You’ve no doubt seen it before – it’s used for everything from scanned documents and handwritten doodles on tablets to Word Lens technology in the GoogleTranslate app.
In this tutorial, you’ll learn how to use Tesseract, an open source OCR engine maintained by Google, to take text from love poems and make it your own. Prepare to impress!
beginning
Download the materials for this tutorial from here, then unzip the folder into a convenient location.
Love In A Snap Catalog contains: · Love In A Snap Starter: · Resources: Images and directories containing Tesseract language data that you will need to process using OCR
Open Love In A Snap Starter/Love In A In Xcode
Going back to Xcode, take a look at ViewController.swift which already contains a couple of @ibOutlets and empty @ibAction methods that are already connected to the main.storyboard interface. It also contains performImageRecognition(_:), which is handled by Tesseract.
Scroll down to see:
// 1 // MARK: - UINavigationControllerDelegate extension ViewController: UINavigationControllerDelegate { } // 2 // MARK: - UIImagePickerControllerDelegate extension ViewController: UIImagePickerControllerDelegate { // 3 func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) { // TODO: Add more code here... }}Copy the code
- Help you easily add pictures
UIImagePickerViewController
Need to beUINavigationControllerDelegate
To access the image selection controller’s proxy method. - The image picker to also want to
UIImagePickerControllerDelegate
To access the image selection controller’s proxy method; imagePickerController(_:didFinishPickingMediaWithInfo:)
The proxy method returns the selected image;
Now it’s your turn to take over and make this application a reality!
The limitation of Tesseract
Tesseract OCR is very powerful, but it does have the following limitations: · Unlike some OCR engines — such as the one used by the U.S. Postal Service to sort mail — Tesseract is not trained to recognize handwriting and limits about 100 fonts in total. · Tesseract requires some pre-processing to improve OCR results: the image needs to be properly scaled, have as much image contrast as possible, and the text must be aligned horizontally. · Finally, Tesseract OCR is only available for Linux, Windows and Mac OS X.
Oh no! How do you plan to use it in iOS? Nexor Technology has created a compatible Swift wrapper for Tesseract OCR.
Add the Tesseract framework
First, you must install Tesseract OCR iOS, a widely used iOS project dependency manager, through CocoaPods.
If you don’t already have CocoaPods installed on your computer, open your terminal and run the following command:
sudo gem install cocoapods
Copy the code
Enter your computer password when the request completes the CocoaPods installation.
Next, go to the Love In A Snap Starter directory. If you have it on your desktop, you can use the following command
cd ~/Desktop/"Love In A Snap/Love In A Snap Starter"
Copy the code
Then type:
pod init
Copy the code
This will create a Podfile for your project. Replace the contents of the Podfile with the following:
platform :ios, '12.1'
target 'Love In A Snap' do
use_frameworks!
pod 'TesseractOCRiOS'
end
Copy the code
This tells CocoaPods that you want TesseractOCRiOS included as a dependency of your project.
Go back to the terminal and type:
pod install
Copy the code
This installs the POD into your project.
When the terminal outputs instructions, “Please close any current Xcode sessions and use Love In A snap.xcworkspace for this project from now on. In A Snap.xcworkspace
How does Tessertact OCR work
In general, OCR uses artificial intelligence to find and recognize text in an image.
Some OCR engines rely on a type of artificial intelligence called machine learning. Machine learning allows systems to learn and adapt to data by recognizing and predicting patterns.
The Tesseract OCR iOS engine uses a specific type of machine learning model called a neural network.
The neural network is then loosely modeled in the human brain. Our brains contain about 86 billion connected neurons, grouped into networks capable of learning specific functions by repetition. Similarly, on a simpler scale, artificial neural networks take multiple sample inputs and learn from successes and failures over time to produce increasingly accurate output. These sample inputs are called “training data”.
When tuning the system, this training data:
- Through the input node of the neural network.
- Propagation occurs through connections between nodes called “edges,” each weighted with the perceived probability that the input should travel along that path.
- It is “hidden” (that is, internal) by one or more layers of nodes that process data using predetermined heuristics.
- Returns the prediction through the output node.
Then the output is compared with the expected output, and the edge weight is adjusted accordingly to make the subsequent training data transmitted to the neural network return more and more accurate results.
Tesseract looks for patterns of pixels, letters, words and sentences. Tesseract uses a two-step approach called adaptive recognition. The data is passed once to identify characters, and then a second time to fill in any letters that are unlikely to fit the context of a given word or sentence.
Adding training Data
To better hone its predictions over the range of a given language, Tesseract needs language-specific training data to perform its OCR.
Navigate to Love in Snap/Resources in Finder. The TessData folder contains a stack of English and French training files. The love poems you will deal with in this tutorial are mainly in English, but some in French. Tres Romanticism!
Now you add TessData to your project. Tesseract OCR iOS requires you to add TessData as a referenced Folder.
- Drag the TessData folder from Finder to the Love In A Snap folder In the Project navigator on the left side of Xcode.
- Select Copy items if needed
- Added Folders option is set to Create Folder References
- Make sure you select Target and click Finish
You should now see a blue TessData folder in the navigator. Blue indicates that folders are referenced instead of Xcode groups.
Now that you’ve added the Tesseract framework and language data, it’s time to get started with some fun coding!
Load the file
First, you need to complete the method of accessing the image from the device’s camera or photo library.
Open viewController.swift and add the following code to takePhoto(_:) :
/ / 1let imagePickerActionSheet =
UIAlertController(title: "Snap/Upload Image",
message: nil,
preferredStyle: .actionSheet)
// 2
if UIImagePickerController.isSourceTypeAvailable(.camera) {
let cameraButton = UIAlertAction(
title: "Take Photo",
style: .default) { (alert) -> Void in
// TODO: Add more code here...
}
imagePickerActionSheet.addAction(cameraButton)
}
// 3
let libraryButton = UIAlertAction(
title: "Choose Existing",
style: .default) { (alert) -> Void in
// TODO: Add more code here...
}
imagePickerActionSheet.addAction(libraryButton)
// 4
let cancelButton = UIAlertAction(title: "Cancel", style: .cancel)
imagePickerActionSheet.addAction(cancelButton)
// 5
present(imagePickerActionSheet, animated: true)
Copy the code
Add it under Import UIKit
import MobileCoreServices
Copy the code
This gives the ViewController access to the kUTTypeImage abstract image identifier, which you will use to restrict the media type of the image selector.
Now in the closure of cameraButton UIAlertAction, replace the // TODO annotation with the following:
// 1
self.activityIndicator.startAnimating()
// 2
let imagePicker = UIImagePickerController()
// 3
imagePicker.delegate = self
// 4
imagePicker.sourceType = .camera
// 5
imagePicker.mediaTypes = [kUTTypeImage as String]
// 6
self.present(imagePicker, animated: true, completion: {
// 7
self.activityIndicator.stopAnimating()
})
Copy the code
Also add the following to the libraryButton closure:
self.activityIndicator.startAnimating()
let imagePicker = UIImagePickerController()
imagePicker.delegate = self
imagePicker.sourceType = .photoLibrary
imagePicker.mediaTypes = [kUTTypeImage as String]
self.present(imagePicker, animated: true, completion: {
self.activityIndicator.stopAnimating()
})
Copy the code
This is the same code you just added to the closure of cameraButton, except imagepicker.sourceType =.photolibrary. Here, you set the image selector to display the device’s photo library instead of the camera.
Next, to deal with captured or the selected images, please insert the following imagePickerController (_ : didFinishPickingMediaWithInfo:) :
// 1
guard let selectedPhoto =
info[.originalImage] as? UIImage else {
dismiss(animated: true)
return
}
// 2
activityIndicator.startAnimating()
// 3
dismiss(animated: true) {
self.performImageRecognition(selectedPhoto)
}
Copy the code
You’ll write the performImageRecognition code in the next part of this tutorial, but for now, just open info.plist. Hover the cursor over the top cell information property list, and then click the + button twice when it appears.
Add privacy-Camera Usage Description and privacy-Photo Library Usage Description to the key field of the two new entries, respectively. Select the String type for each. Then in the Value column, enter whatever text you want to display to the user when requesting access to its camera and photo library, respectively.
Build and run your project. Click the Snap/Upload Image button and you will see the UIAlertController you just created.
Test out the action table option and grant the application access to the camera and/or library when prompted. Confirm the photo library and camera display as expected.
Note: If you are running on an emulator, there is no physical camera available, so you will not see the “Take Photos” option.
Realize the Tesseract OCR
First, import MobileCoreServices so that the ViewController can use the Tesseract framework:
import TesseractOCR
Copy the code
Now, in performImageRecognition (_ :), replace the // TODO annotation with the following:
/ / 1if let tesseract = G8Tesseract(language: "eng+fra") {
// 2
tesseract.engineMode = .tesseractCubeCombined
// 3
tesseract.pageSegmentationMode = .auto
// 4
tesseract.image = image
// 5
tesseract.recognize()
// 6
textView.text = tesseract.recognizedText
}
// 7
activityIndicator.stopAnimating()
Copy the code
Since this is the content of this tutorial, here is our detailed explanation line by line:
- Initialize the TesserAct with the new G8Tesseract object, which will use English (” ENG “) – and French (” FRA “) – trained language data. Note that the French accented characters in the poem are not in the English character set, so French training data must be included in order for these accented characters to appear
- Tesseract provides three different OCR engine modes:. TesseractOnly, which is the fastest but least accurate method; .cubeonly, which is slower but more accurate because it uses more artificial intelligence; TesseractCubeCombined, which runs both tesseractOnly and cubeOnly. TesseractCubeCombined is the slowest, but since it is the most accurate, you will use it in this tutorial
- By default, Tesseract assumes that it is working with uniform blocks of text, but your sample image has multiple paragraphs. Tesseract’s pageSegmentationMode lets the Tesseract engine know how text is divided. In this case, set pageSegmentationMode to.auto to allow automatic page segmentation, thus recognizing paragraph interrupts
- Assigns the selected image to the Tesseract instance
- Tell Tesseract to start recognizing your text
- Put the Tesseract recognition text output into the textView
- Hide the activity indicator since OCR is complete
Now, it’s time to test the first batch of new code!
Process your first image
In Finder, navigate to Love In A Snap/Resources/lenore.png to find sample images.
Lenore.png is an image of a love poem written for “Lenore”, but with some editing you can turn it into a poem that will surely catch the attention of the person you desire!
While you can print a copy of the image and then use the app to take a photo to perform OCR, you can easily do it yourself and add the image directly to the device’s camera film. This eliminates the possibility of human error, further inconsistencies in lighting, text skew and defective prints. After all, the image is already dark and blurry.
Note: If you are using an emulator, simply drag and drop the image file onto the emulator to add it to its photo library.
Build and run your application. Click Snap/Upload Image, click Choose Existing, and then select the sample Image from the photo library to run it through OCR.
Note: You can safely ignore the hundreds of compile warnings generated by the TesseractOCR library.
Oh oh! Nothing! This is because the current image size is too large for Tesseract to handle. It’s time for a change!
Scale the image while maintaining aspect ratio
The aspect ratio of an image is the ratio of its width to its height. Mathematically, to reduce the size of the original image without affecting the aspect ratio, the aspect ratio must remain constant.
When you know the height and width of the original image, and you know the desired height or width of the final image, you can rearrange the aspect ratio equation as follows:
From this, we can get two formulas: Formula 1: when the width of the image is greater than its height.
Height1/Width1 * width2 = height2
Copy the code
Formula 2: when the height of the image is greater than its width.
Width1/Height1 * height2 = width2
Copy the code
Now add the following extensions and methods to the bottom of viewController.swift:
// MARK: - UIImage extension
//1
extension UIImage {
// 2
func scaledImage(_ maxDimension: CGFloat) -> UIImage? {
// 3
var scaledSize = CGSize(width: maxDimension, height: maxDimension)
// 4
if size.width > size.height {
scaledSize.height = size.height / size.width * scaledSize.width
} else {
scaledSize.width = size.width / size.height * scaledSize.height
}
// 5
UIGraphicsBeginImageContext(scaledSize)
draw(in: CGRect(origin: .zero, size: scaledSize))
let scaledImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
// 6
return scaledImage
}
}
Copy the code
Now, at the top of performImageRecognition (_ 🙂 add:
let scaledImage = image.scaledImage(1000) ?? image
Copy the code
This will try to scale the image so that it is no more than 1,000 points wide or long. If scaledImage () cannot return a scaledImage, the constant defaults to the original image.
Then replace tesseract.image = image with the following code
tesseract.image = scaledImage
Copy the code
This assigns the scaled image to the Tesseract object.
Build again from the photo library:
But chances are your results won’t be perfect. There is room for improvement……