First, show the effect

Recently, the online education industry suffered a small setback, some search questions, intelligent correction functions to be offline.

Step back to speak, do you want to do an automatic correction function ah? In case the kid needs it one day!

I had a dream last night where I implemented this feature, as shown below:

Function introduction: against, can play the number; Do wrong, can put a cross; What you didn’t do, you can make up the answer.

When I woke up, I looked around and quickly lay down again, hoping the dream would come back on.

Two, implementation steps

The basic idea

In fact, there are two things you need to do, the first is to recognize numbers, and the second is to segment numbers.

First of all, you have to be able to recognize that 5 is 5, that’s a prerequisite, and then you have to be able to find the location of the number region 5, 6, 7, 8.

The former is image recognition and the latter is image cutting.

  • For image recognition, the general routine is as follows (CNN convolutional neural network) :
Graph LR A [character] image data sets -- - > B (training) -- -- > C (results) (training) D [images] characters - input -- -- - > C recognition output > E value [character]
  • For image cutting, the general routine is the following (horizontal and longitudinal projection method) :

Now that the idea makes sense, let’s start with image recognition.

To do image recognition yourself, prepare the data -> train the data and save the model -> use the training model to predict the results.

2.1 Preparing Data

For your boyfriend, find a smooth-talking playboy, rather than a dull IT guy, and personally train him to be what you expect.

Let’s not use any official MNIST dataset, because it’s official, not yours, and it doesn’t have any if you want to add +-×÷.

Some generic data sets, while powerful and convenient, don’t work as well as you’d like once you put them into your scene.

Only train the data in your hand, and then use it yourself. More importantly, we enjoy the creative process.

Assuming that we only identify oral arithmetic, the image data we need are as follows:

Index:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Character:0 1 2 3 4 5 6 7 8 9= + -× ÷Copy the code

If you can recognize these, you can basically add, subtract, multiply and divide integers.

All right, where did you get the picture? !

Yeah, where did you get the picture?

Scared me almost woke up from a dream, 5 million are planning how to spend, incredibly bichromatic ball has not yet selected number!

In my dream, an old man told me that the pictures should be generated by themselves. I asked him how it was generated, and he laughed and disappeared into the mist…

Think carefully, in fact, it is not difficult to type, we always bar, the generation of numbers is nothing more than to use code to write the word on the picture.

The main reason why characters can be displayed is because of the support of fonts.

If you are running Windows, open the folder C:\Windows\FontsC:\Windows\FontsC:\Windows\Fonts and you will find many Fonts.

We write code to call these fonts, and then we print it onto a picture, and then we have data.

And these data are completely controlled by us, want more, want less, want numbers, letters, Chinese characters, symbols can be, today you come out of the number recognition, it is equivalent to you have all the recognition at the same time! Think there is a little bit of excitement!

See, that’s the difference between working and starting a business. You use other people’s data is equivalent to a part-time job, you do not have to worry about, but he gives you what you have. Creating data by yourself is equivalent to starting a business. Although it is hard in the early stage, you can completely control the pace by yourself. If you need to add data, take it out if it is useless.

2.1.1 Preparing fonts

Create a fonts folder and copy some fonts from the font library. Here I have copied 13 font files.

All right, you’re ready. You must be tired. Take a break, take a break.

2.1.2 Generate pictures

The code is as follows and can be run directly.

from __future__ import print_function
from PIL import Image
from PIL import ImageFont
from PIL import ImageDraw
import os
import shutil
import time

#%% The text to be generated
label_dict = {0: '0'.1: '1'.2: '2'.3: '3'.4: '4'.5: '5'.6: '6'.7: '7'.8: '8'.9: '9'.10: '='.11: '+'.12: The '-'.13: The '*'.14: 'present'}

Create a file for each category
for value,char in label_dict.items():
    train_images_dir = "dataset"+"/"+str(value)
    if os.path.isdir(train_images_dir):
        shutil.rmtree(train_images_dir)
    os.makedirs(train_images_dir)

# %% generate image
def makeImage(label_dict, font_path, width=24, height=24, rotate = 0) :

    # Fetch key-value pairs from the dictionary
    for value,char in label_dict.items():
        Create an image with a black background and size 24 by 24
        img = Image.new("RGB", (width, height), "black") 
        draw = ImageDraw.Draw(img)
        Load a font that is 90% of the width of the image
        font = ImageFont.truetype(font_path, int(width*0.9))
        Get the width and height of the font
        font_width, font_height = draw.textsize(char, font)
        # Calculate the x and Y coordinates of the font drawing, mainly so that the text is drawn in the icon center
        x = (width - font_width-font.getoffset(char)[0) /2
        y = (height - font_height-font.getoffset(char)[1) /2
        # Draw a picture, draw there, draw what, what color, what font
        draw.text((x,y), char, (255.255.255), font)
        # Set the image tilt Angle
        img = img.rotate(rotate)
        PNG: dataset/ id/img-id _r- select Angle _ timestamp
        time_value = int(round(time.time() * 1000))
        img_path = "dataset/{}/img-{}_r-{}_{}.png".format(value,value,rotate,time_value)
        img.save(img_path)
        
# %% the path to the font
font_dir = "./fonts"
for font_name in os.listdir(font_dir):
    Take each font and generate a batch of images for each font
    path_font_file = os.path.join(font_dir, font_name)
    Tilt from -10 to 10 degrees, each Angle generates a batch of images
    for k in range(-10.10.1) :# Each character generates a picture
        makeImage(label_dict, path_font_file, rotate = k)
Copy the code

The above pure code is less than 30 lines, I believe we should be able to understand! Not reading is not my reader.

The core code is to draw text.

draw.text((x,y), char, (255, 255, 255), font)
Copy the code

Use a font to write a white char at (x,y) of the image on the black background.

The core logic is a three-tier loop.

Graph of TD A [font library] - take A font - > B - Angle library took A point of view - > C library [character] - take A character - > D [wrote the picture] D - once again - > A D - once again - > B D - once again - > C

If you run the code correctly, it will eventually produce the following result:

All right, the data is ready. There are 15 folders in total, and under each folder there are 3900 character pictures (15 types of characters × 13 types of fonts × 20 angles) corresponding to various fonts and angles. The size of the pictures is 24×24 pixels.

With the data, we can move on to the next step, which is training and using the data.