This article has participated in the call for good writing activities, click to view: back end, big front end double track submission, 20,000 yuan prize pool waiting for you to challenge!
Following the article, this time the entry-level CNN convolutional neural network will be used to complete price recognition. (In order to mirror the above, make one last clickbait 🥺)
1 analysis
The original picture has been obtained, and then the picture is processed, and then cut. As raw material for machine learning.
Since the image is in PNG format, it is usually 4 channels (RGB + transparency).
General processing process:
1. Get the original picture:
4 channels (RGB + transparency)
2 To grayscale image: single channel, pixel value 0-255
Gray conversion formula: L = R * 299/1000 + G * 587/1000 + B * 114/1000
3 gray image binarization: in fact, the image pixel value is converted to 0 or 1
[0 if _ < 200 else 1] [0 if _ < 200 else 1]
If the data is complex, it also involves border removal, edge detection, tilt correction, cutting, noise reduction (corrosion, swelling), etc.
This data is relatively simple, converted to binary data can be used directly.
2 identify
2.1 Cutting pictures
Cut key code:
lines = [-281.16, -249.92, -218.68, -187.44, -156.2, -124.96, -93.72, -62.48, -31.24, -0.0]
lines_step = 22
lines_map = {
'281.16': 336.'249.92': 299.'218.68': 261.'187.44': 223.'156.2': 187.'124.96': 149.'93.72': 112.'62.48': 74.'31.24': 38.'0.0': 1,
}
idx = 1
def process_img(imgpath: str) :
global idx
# Original image
img = Image.open(imgpath)
width, height = img.size
img2 = copy.deepcopy(img)
img_arr = np.array(img)
print(img_arr.shape)
# to turn gray
L = R * 299/1000 + G * 587/1000 + B * 114/1000 ≈ 361
img_gray = img.convert('L')
img_gray_arr = np.array(img_gray)
print(img_gray_arr.shape)
for data in img_gray_arr:
pass
# print(''.join(['{:03}'.format(_) for _ in data]))
# print(''.join(['{:03}'.format(_) if _ != 0 else '...' for _ in data]))
# binarization
img_bin = img_gray.point([0 if _ < 128 else 1 for _ in range(256)].'1')
img_bin_arr = np.array(img_bin)
print(img_bin_arr.shape)
for data in img_bin_arr:
pass
# print(''.join(['1' if _ else '0' for _ in data]))
# print(''.join(['X' if _ else '.' for _ in data]))
# Image processing
img_draw = ImageDraw.Draw(img2)
for line in lines:
new_line = lines_map.get(str(line))
p1 = (new_line, 1)
p2 = (new_line+22, height-1)
# Photo circle
img_draw.rectangle((p1, p2), outline='red')
# Image cropping
img_crop = img_bin.crop((new_line, 0, new_line+22, height))
img_crop.save(os.path.join('imgs_crop'.'{:03}.png'.format(idx)))
idx += 1
plt.imshow(img2)
plt.show()
Copy the code
Pictures after cutting:
The images are then manually sorted and placed into folders named by numbers. Complete manual annotation.
2.2 Recognition training
Python3 Keras + TensorFlow is used primarily to do this.
Examples of model code:
def gen_model() :
""" Build model :return: model """
_model = Sequential([
# convolution layer
# 36 is the output dimension, that is, the number of convolution kernels
# kernel_size is the size of the convolution kernel
Conv2D(36, kernel_size=3, padding='same', activation='relu', input_shape=(36.22.1)),
# maximum pooling layer
MaxPooling2D(pool_size=(2.2)),
Dropout # Dropout involves setting the input unit's ratio randomly to 0 at each update during training, which helps prevent overfitting.
Dropout(0.25),
# convolution layer
Conv2D(64, kernel_size=3, padding='same', activation='relu', input_shape=(36.36.1)),
# maximum pooling layer
MaxPooling2D(pool_size=(2.2)),
#
Dropout(0.25),
Flatten the input to turn multidimensional data into one-dimensional data
Flatten(),
Full connection layer
Dense(512, activation='relu'),
Dropout(0.5),
Dense(10, activation='softmax'),])return _model
Copy the code
Examples of training code:
def train() :
model = gen_model()
model.summary()
# model compilation
Optimizer model
# Loss function name, objective function
# metrics includes metrics that evaluate the network performance of the model during training and testing
model.compile(optimizer='adam'.# keras.optimizers.Adadelta()
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
x_train, y_train = load_data()
x_train = x_train.reshape(-1.36.22.1)
x_test, y_test = load_test_data()
x_test = x_test.reshape(-1.36.22.1)
# callbacks=tensorboard monitor
Conduct training evaluations
# x_train input data
# y_train label
# batCH_size gradient drops when each batch contains the number of samples. When training, a batch of samples will be calculated once for gradient descent, making the objective function optimized one step.
Epochs is an integer, the number of training rounds. Each epoch will take the training set one time.
# verbose Displays that 0 indicates that log information is not output in the standard output stream, 1 indicates that progress bar records are output, and 2 indicates that each epoch is output one line
# validation_data validates the dataset
history = model.fit(x_train, y_train, batch_size=32, epochs=20, verbose=1, validation_data=(x_test, y_test),)
# Epochs data set all the samples run once times with the number of batCH_size in a group for training and weight adjustment
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Draw the accuracy value of training set and test set during training
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train'.'Test'], loc='upper left')
plt.show()
Draw the loss values of training set and test set during training
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train'.'Test'], loc='upper left')
plt.show()
model.save('model/ziru.h5')
Copy the code
Sample code for training data generation:
Main points: train_lable and train_data. Lable is the corresponding data label, that is, the value to be identified as. Data is the specific data value of the corresponding data.
def gen_train_data(parent_path: str) :
train_data = []
train_label = []
for idx in range(10):
cur_path = os.path.join(parent_path, str(idx))
for dirpath, dirnames, filenames in os.walk(cur_path):
for filename in filenames:
if filename.endswith('png'):
imgpath = os.path.join(cur_path, filename)
label = imgpath.split('/') [1]
data = np.array(Image.open(imgpath))
train_label.append(int(label))
train_data.append(data)
return np.array(train_data), np.array(train_label)
Copy the code
The training process is as follows:
Because the picture is relatively simple, simple training can basically reach 100% recognition.
Epoch 1/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 1 68 ms/s step - loss: 2.0173 accuracy: 0.3350 - val_loss: 1.3893 - val_accuracy: 0.7950 Epoch 2/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 43 ms/step - - 0 s loss: 1.1314 accuracy: 0.6900 - val_loss: 0.5309 - val_accuracy: 1.0000 Epoch 3/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 36 ms/step - loss: 0.5474-accuracy: 0.8100-val_loss: 0.1853-val_accuracy: 1.0000 Epoch 4/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 36 ms/step - loss: 0.2606 accuracy: 0.9250 - val_loss: 0.0842 - val_accuracy: 1.0000 Epoch 5/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 34 ms/step - loss: 0.2730 accuracy: 0.9250 - val_loss: 0.1025 - val_accuracy: 0.9700 Epoch 6/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 37 ms/step - loss: 0.1857-accuracy: 0.9300-val_loss: 0.0365-val_accuracy: 1.0000 Epoch 7/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 35 ms/step - loss: 0.0952 accuracy: 0.9800 - val_loss: 0.0165 - val_accuracy: 1.0000 Epoch 8/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 35 ms/step - loss: 0.0560 accuracy: 0.9900 - val_loss: 0.0076 - val_accuracy: 1.0000 Epoch 9/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 35 ms/step - loss: 0.0125-accuracy: 1.0000-val_loss: 0.0066-val_accuracy: 1.0000 Epoch 10/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 36 ms/step - loss: 0.0173 accuracy: 1.0000 - val_loss: 0.0024 - val_accuracy: 1.0000 Epoch 11/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 34 ms/step - loss: 0.0086-accuracy: 1.0000-val_loss: 0.0014-val_accuracy: 1.0000 Epoch 12/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 37 ms/step - loss: 0.0061 accuracy: 1.0000 - val_loss: E-04 val_accuracy - 8.3420:1.0000 Epoch 13/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 33 ms/step - loss: 0.0051-accuracy: 1.0000-val_loss: 4.9917E-04-val_accuracy: 1.0000 Epoch 14/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 35 ms/step - loss: 0.0020 accuracy: 1.0000 - val_loss: E-04 val_accuracy - 3.4299:1.0000 Epoch 15/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 35 ms/step - loss: 0.0037-accuracy: 1.0000-val_loss: 2.3839E-04-val_accuracy: 1.0000 Epoch 16/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 34 ms/step - loss: 0.0028 accuracy: 1.0000 - val_loss: E-04 val_accuracy - 2.0110:1.0000 Epoch 17/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 36 ms/step - loss: 0.0012-accuracy: 1.0000-val_loss: 1.8016E-04-val_accuracy: 1.0000 Epoch 18/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 35 ms/step - loss: 0.0015 accuracy: 1.0000 - val_loss: E-04 val_accuracy - 1.5284:1.0000 Epoch 19/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 38 ms/step - loss: 8.4545E-04-accuracy: 1.0000-val_loss: 1.3383E-04-val_accuracy: 1.0000 Epoch 20/20 7/7 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 36 ms/step - loss: 7.2767 e-04 - accuracy: 1.0000-val_loss: 1.2135E-04-VAL_accuracy: 1.0000 Test Loss: 0.00012135423457948491 Test accuracy: 1.0Copy the code
Training loss and accuracy chart:
2.3 Identification and Verification
Load the model, pass in the data, and get the recognition result.
Sample code:
def __recognize_img(img_data) :
model = load_model('model/ziru.h5')
img_arr = np.array(img_data)
img_arr = img_arr.reshape((-1.36.22.1))
result = model.predict(img_arr)
predict_val = __parse_result(result)
return predict_val
def __parse_result(result) :
result = result[0]
max_val = max(result)
for i in range(10) :if max_val == result[i]:
return i
Copy the code
3 packages
After completing the identification process, all that remains is to encapsulate and expose the service.
For convenience, has made the interface services: testing interface = = > https://lemon.lpe234.xyz/common/ziru/
4 summarizes
The use of CNN in this article is basically at the entry level. In fact, digital recognition can also be recognized by key pixel points. For example, there must be a difference between 1 and 3 pictures, and the difference can be recognized basically.