Optical Character Recognition(OCR) commonly refers to the Recognition of the text in the picture, such as the ID number, name, address on the ID card, the card number on the bank card, etc.
Evil
Github repo
Evil is a simple recognition framework for iOS and macOS. Supports installation through CocoaPods, Carthage, and Swift Package Manager. The underlying recognition model can be easily migrated to other platforms.
The basic flow of OCR recognition
- The area to be identified is captured from the whole image
Eg: Capture the rectangular area where the ID card is located from the whole picture
- Intercept the text area
Eg: Intercept the id card number through a certain algorithm
- A series of preprocessing to the text area to facilitate the next operation
Eg: Gaussian blur, expansion and so on
- Split text, meaning that the text area is divided into a singleword
- Will be a single
word
Throw it into the neural network
Evil uses the latest Vision framework to do this. Four steps before Apple gives us a useful method to use system For example: VNDetectTextRectanglesRequest. So we won’t discuss the implementation details of the first 4 steps here, but if you want to learn how to use the API, you can see it here.
How to use neural network to recognize a single word
I personally think that the recognition of a small number of printed words can be processed by picture classification model, if you have a better solution, welcome to communicate. This is easy to follow if you’re a CNN tuner, but if you don’t know the basics of neural networks, it might be a little hard to follow, because I don’t know much about them either. If you don’t have relevant knowledge before, you can learn about the Turi Create provided by Apple, which saves you from designing your own network.
0x00 Design the network
First of all, we need to design a CNN network to input our single word picture for recognition. Because our recognition task is very simple, the network architecture will be very simple. Here is the Keras (2.0.6) code:
model = Sequential()
model.add(Conv2D(32, (5.5), input_shape=(28.28.1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2.2)))
model.add(Dropout(0.5))
model.add(Conv2D(64, (3.3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2.2)))
model.add(Dropout(0.2))
model.add(Conv2D(128, (1.1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2.2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
Copy the code
0x01 Generates training Data
We know that in order to train the network, you have to have a lot of raw data so what do we do without that? Some training resources can be found on the network, but like us here to identify the ID number, how to do?
Of course is to write script generation, such as our Sir Into many many id number area. Make some random changes to increase the diversity of the data.
o_image = Image.open(BACKGROUND)
draw_brush = ImageDraw.Draw(o_image)
font_size = random.randint(- 5.5) + 35
draw_brush.text((10 + random.randint(- 10.10), 15 + random.randint(2 -.2)), LABELS,
fill='black',
font=ImageFont.truetype(FONT, font_size))
o_image = ImageEnhance.Color(o_image).enhance(
random.uniform(0.5.1.5)) # coloring
o_image = ImageEnhance.Brightness(o_image).enhance(
random.uniform(0.5.1.5)) # brightness
o_image = ImageEnhance.Contrast(o_image).enhance(
random.uniform(0.5.1.5)) # contrast
o_image = ImageEnhance.Sharpness(o_image).enhance(
random.uniform(0.5.1.5)) # rotation
o_image = o_image.rotate(random.randint(2 -.2))
o_image.save(output + '/%d.png' % idx)
Copy the code
Once we have the text area, we need to divide the text area into single words and train the network. Because the next task is universal, I simply wrote a small tool called PrepareBot. The specific code is here, you can go and have a look.
0x02 Training network
With the data, with the network model, the training network is very simple and it looks like this:
model.fit_generator(generator=train_data_generator)
Copy the code
Good to here, we observe the convergence of the network and identification accuracy, if not very bad, you can save the model, for the future identification task to prepare. Note that the Keras model generated in this step is cross-platform, meaning that it can be recognized on Windows, Linux, and even Android.
0x03 Conversion Network
In the previous steps we generated Keras network models. How do you use these models in Evil? First of all, we need to use coremltools provided by apple to convert keras model into CoreModel
# Prepare model for inference
for k in model.layers:
if type(k) is keras.layers.Dropout:
model.layers.remove(k)
model.save("./temp.model")
core_ml_model = coremltools.converters.keras.convert("./temp.model",
input_names='image',
image_input_names='image',
output_names='output',
class_labels=list(labels),
image_scale=1 / 255.)
core_ml_model.author = 'gix.evil'
core_ml_model.license = 'MIT license'
core_ml_model.short_description = 'model to classify chinese IDCard numbers'
core_ml_model.input_description['image'] = 'Grayscale image of card number'
core_ml_model.output_description['output'] = 'Predicted digit'
core_ml_model.save('demo.mlmodel')
Copy the code
Save the demo. mlModel file for later use.
0x04 Importing a Network
We have the model file, how do we import the Evil framework?
Just drag it into Xcode
A significant disadvantage of this method is that it will increase the size of the app, so we do not recommend using this method. But this will always give you the easiest and most straightforward way to do this during our debugging.
Runtime download
This doesn’t have any effect on the size of your app, but you’ll notice that you need to download model files at run time and code complexity increases, but the good news is that Evil provides very friendly support for this. All you need to do is save the model file on your server or CDN and configure the download path to Evil in the info.plist file to automatically configure your network model.
0x05 Using a Network
Everything is there, how to use it? Simply invoke the interfaces provided by Evil as outlined in steps 1-5. For example,
// 1. Use Evil built-in model to identify ID number
lazy var evil = try? Evil(recognizer: .chineseIDCard)
let image: Recognizable=...let cardNumber = self.evil? .recognize(image)print(cardNumber)
Copy the code
// 2. Use custom models
let url: URL=...let evil = try? Evil(contentsOf: url, name: "demo")
let ciimage = CIImage(cvPixelBuffer: pixelBuffer).oriented(orientation)
if let numbers = ciimage.preprocessor
// Perspective correction
.perspectiveCorrection(boundingBox: observation.boundingBox,
topLeft: observation.topLeft,
topRight: observation.topRight,
bottomLeft: observation.bottomLeft,
bottomRight: observation.bottomRight)
.mapValue({Value($0.image.oriented(orientation), $0.bounds)})
// Make sure your id card faces up
.correctionByFace()
// Intercept the number area
.cropChineseIDCardNumberArea()
// Preprocessing gaussian blur etc
.process()
// Split text
.divideText()
// Simple verification.value? .map({$0.image }), numbers.count= =18 {
if let result = try? self.evil? .prediction(numbers) {if letcardnumber = result? .flatMap({ $0 }).joined() {
DispatchQueue.main.async {
self.tipLabel.text = cardnumber
}
}
}
}
Copy the code
conclusion
Good, good advertising is here, welcome to ridicule, welcome to star fork, welcome to all pr, welcome to contribute your own training model. Writing for the first time in nuggets thanks for your support.