First, show the effect
Recently, the online education industry suffered a small setback, some search questions, intelligent correction functions to be offline.
Step back to speak, do you want to do an automatic correction function ah? In case the kid needs it one day!
I had a dream last night where I implemented this feature, as shown below:
Function introduction: against, can play the number; Do wrong, can put a cross; What you didn’t do, you can make up the answer.
When I woke up, I looked around and quickly lay down again, hoping the dream would come back on.
Two, implementation steps
Today we’re going to focus on training and using data.
Review past
- 2.1 Preparing Data
- 2.1.1 Preparing fonts
- 2.1.2 Generate pictures
2.2 Training data
2.2.1 Model building
You see the code first, the layman feels very abstruse, the expert secretly laughs.
Import the necessary packages
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import pathlib
import cv2
# %% Build model
def create_model() :
model = Sequential([
layers.experimental.preprocessing.Rescaling(1./255, input_shape=(24.24.1)),
layers.Conv2D(24.3,activation='relu'),
layers.MaxPooling2D((2.2)),
layers.Conv2D(64.3, activation='relu'),
layers.MaxPooling2D((2.2)),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(15)]
)
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
Copy the code
The sequence of this model is as follows, the function is to input an image data, through various layers of kneading, and finally predict which category the image belongs to.
Graph TD I(input: Image 24*24 pixels)-->A A[convoltional layer 1 Conv2D] --> A1[convoltional layer 2 MaxPooling2D]--> B[convoltional layer 2 MaxPooling2D]--> C0[Dense 128] - > [15] 2 Dense connection layer of the C C - > O1 (0-30%) C - > O2 (1-20%) C - > O3 (2-0.5%) C - > m1 (...). C --> O5(=: 1%)
What are all these layers for? What’s the point? Underwear, shirts, sweaters, cotton-padded coats all have their uses.
2.2.2 Convolution layer Conv2D
Investigators from various functional departments collect and collate specific data within an organization’s area.
What we put in is an image, which is made up of pixels, Rescaling(1./255, input_Shape =(24,24,1)) 1))Rescaling(1./255,inputshape=(24,24,1)), input_shape inputshape is 24*24 pixels 1 channel (color is RGB 3 channels) image.
The definition in the convolution layer code is Conv2D(24,3), which means to extract 24 features with a 3*3 pixel convolution kernel.
Let me transfer the picture to the map so you can understand. Take the central district of Jinan as an example.
Convolution is the equivalent of gathering multiple sets of specific information from a level of unit region on the map. For example, the community as a unit to extract the number of housing, parking, school, population, annual income, education, age and other 24 dimensions of information. The cell is equivalent to the convolution kernel.
So once you’ve extracted it, it looks like this.
After the first convolution, we get N cells from the downtown area.
Convolution can be multiple times.
For example, after the convolution of the cell, we can do the convolution again on the basis of the cell, where the convolution is the street.
By convolving the cell again in terms of streets, we get N streets of data from the downtown area.
That’s what convolution does.
Through one convolution, a large image is rolled up in a specific way, and finally several groups of purposeful data are left, so as to facilitate subsequent selection decisions. This is the data of a district, if you choose Jinan, or even Shandong province, the same convolution. This is the same as the selection of civilized cities and economically strong provinces in real life.
2.2.3 Pooling Layer MaxPooling2D
In plain English, it’s round.
The computing power of computers is powerful, faster than you and I, but it is not without cost. We certainly want it to be as fast as possible, and if a method can be used in half the time, we are certainly willing to use that method.
That’s what the pooling layer does. The code definition for pooling looks like this: MaxPooling2D((2,2))MaxPooling2D((2,2))MaxPooling2D((2,2)), here is maximum pooling. (2,2) is the size of the pool layer, which is actually in the 2 by 2 region, which we think can be combined into one unit.
Take the map as another example. For example, the data in the 16 boxes below is the number of schools in the 16 streets.
In order to further improve the calculation efficiency and reduce the calculation of some data, we use 2*2 pooling layer for pooling.
The square of pooling is a combination of 4 streets. The number of schools in the new unit is the one with the largest number of schools in the members (there are also the smallest ones, which are averaged with multiple pooling). After pooling, the 16 cells become 4 cells, reducing the data.
That’s where the pooling layer comes in.
2.2.4 Fully connected layer Dense
Three thousand weak water, just take one ladle.
In this case, it’s actually a classifier.
When we built it, the code was Dense. Dense. Dense.
What it does, no matter how many dimensions you have in front of you, comes to me and I’m going to force it into a fixed channel.
For example, to recognize the letters A to Z, I have 500 neurons involved, but the final output is 26 channels (A, B, C… , y, z).
We have a total of 15 classes of characters here, so 15 channels. Given an input, the output is the probability of each category.
Graph of TD layer [other] -- > I A A link layer 1 Dense 128 [all] A - - > O1 (1) -- -- -- -- > A > C O2 (2) -- -- > C A - - > O3 (3) -- -- -- -- > A > C O6 (4) -- -- -- -- > A > C O7(5) --> C A --> O4(... ) -- -- -- -- > A > C O8 (127) -- > C A - - > O5 (128) - - > C C [15] 2 Dense connection layer of the C - > 1 o1 (1-30%) C - > 1 o2 (2-20%) C - > 1 o3 (3: 0.5%) C --> 1O4(...) C --> 1O5(15: 1%)
Note: the above are two-dimensional inputs, such as 24×24, but the full connection layer is one-dimensional, so it is used in the code
Flatten two-dimensional data into one-dimensional data ([[11,12],[21,22]]->[11,12,21,22]).
For the overall model, call model.summary()model.summary()model.summary() model.summary() print the network structure of the sequence as follows:
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= rescaling_2 (Rescaling) (None, 24, 24, 1) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 22, 22, 24) 240 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 11, 11, 24) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 9, 9, 64) 13888 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 4, 4, 64) 0 _________________________________________________________________ flatten_2 (Flatten) (None, 1024) 0 _________________________________________________________________ dense_4 (Dense) (None, 128) 131200 _________________________________________________________________ dense_5 (Dense) (None, 15) 1935 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 147263 Trainable params: 147263 Non - trainable params: 0 _________________________________________________________________Copy the code
We see that conv2d_5 (Conv2D) (None, 9, 9, 64) becomes max_pooling2d_5 (MaxPooling2 (None, 4, 4, 64) after 2*2 pooling. (None, 4, 4, 64) then a flattenten became one dimensional (None, 1024), then a fully connected (None, 128), then a fully connected (None, 15), and 15 became our final classification. We designed it all.
Model.compilemodel.compilemodel.com is running the configuration model of several parameters, remember this at this stage.
2.2.5 Training data
Execution is done.
# count all images in the folder
data_dir = pathlib.Path('dataset')
# Generate data set by reading images from folder
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir, Which file to get data from
color_mode="grayscale".The color of the obtained data is grayscale
image_size=(24.24), # The size of the picture
batch_size=32 # How many pictures are in a batch
)
# Dataset classification, corresponding to the dataset folder under the number of image classification
class_names = train_ds.class_names
Save the data set classification
np.save("class_name.npy", class_names)
# Data set cache processing
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
# create model
model = create_model()
# Training model, EPOchs =10, all data sets are trained 10 times
model.fit(train_ds,epochs=10)
# Save weights after training
model.save_weights('checkpoint/char_checkpoint')
Copy the code
After the command is executed, the following information is displayed:
Found 3900 files belonging to 15 classes. Epoch 1/10 122/122 [=========] -2s 19ms/ step-loss: 0.5795-accuracy: 0.8615 Epoch 2/10 122/122 [=========] -2s 18ms/ step-loss: 0.0100-accuracy: Epoch 3/10 122/122 [=========] -2s 19ms/ step-loss: 0.0027-accuracy: 1.0000 Epoch 4/10 122/122 [=========] -2s 19ms/ step-loss: 0.0013-accuracy: 1.0000 Epoch 5/10 122/122 [=========] -2s 20ms/ step-loss: 8.4216E-04-accuracy: 1.0000 Epoch 6/10 122/122 [=========] -2s 18ms/ step-loss: 5.5273E-04-accuracy: 1.0000 Epoch 7/10 122/122 [=========] -3s 21ms/ step-loss: 4.0966E-04-accuracy: 1.0000 Epoch 8/10 122/122 [=========] -2s 20ms/ step-loss: 3.0308E-04-accuracy: 1.0000 Epoch 9/10 122/122 [=========] -3s 23ms/ step-loss: 2.3446E-04-accuracy: 1.0000 Epoch 10/10 122/122 [=========] -3s 21ms/ step-loss: 1.8971E-04-accuracy: 1.0000Copy the code
And we saw that on the third time, it was 100% accurate. At the end of the session, we found several more files in checkpoint:
char_checkpoint.data-00000-of-00001
char_checkpoint.index
checkpoint
Copy the code
The files above are the results of training and will not be touched after the training is saved. You can use these data directly to make predictions later.
2.3 Forecast Data
Finally, it’s time to enjoy the results.
# Set the image to be recognized
img1=cv2.imread('img1.png'.0)
img2=cv2.imread('img2.png'.0)
imgs = np.array([img1,img2])
# Build a model
model = create_model()
# Load the weights from previous training
model.load_weights('checkpoint/char_checkpoint')
# Read the picture category
class_name = np.load('class_name.npy')
# Prediction picture, get the predicted value
predicts = model.predict(imgs)
results = [] Save an array of results
for predict in predicts: # Iterate over each prediction
index = np.argmax(predict) # find maximum value
result = class_name[index] # fetch character
results.append(result)
print(results)
Copy the code
Let’s find two images img1.png,img2.png, one is the number 6, and the other is the number 8. Put the two pictures in the same directory as the code to verify the recognition effect.
The image is converted to a two-dimensional array structure by cv2.imread(‘img1.png’,0). The 0 parameter is a grayscale image. After processing, the image is transformed into an array with the structure shown below (24,24) :
We need to verify two graphs at the same time, so we put the two graphs together to form the imgs. The structure of the imgs is (2,24,24).
The next step is to build the model and then load the weights. Exogenous imgs were predicted by calling t = model. Predict (IMGS)
The structure was (2,15) and the value was shown below:
[[16.134243-12.10675-1.1994154-27.766754-43.4324-9.633694-12.214878 1.6287893 2.562174 3.2222707 13.834648 28.254173-6.102874 16.76582 7.2586184] [5.022571-8.762314-6.7466817-23.494259-30.170597 2.4392672-14.676962] [5.022571-8.762314-6.7466817-23.494259-30.170597 5.8255725 8.855118-2.0998626 6.820853 7.6578817 1.5132296 24.4664 2.4192357]]
That means there are 2 predictions, and each picture has 15 possible predictions.
Then find the maximum possible index based on index = np.argmax(predict).
The numeric result for the character found by index is [‘6’, ‘8’].
Here is the monitoring of data in memory:
Therefore, our prediction is accurate.
Next, we are going to cut out the numbers in the picture for identification.