This is the third day of my participation in the August Text Challenge.More challenges in August
Gesture recognition system based on Opencv+ Keras
Gesture recognition system based on Opencv+ Keras
technology
Python3.6 + opencv + keras + numpy + PIL
The image processing
Through the image corrosion, gray processing, adaptive threshold segmentation and other operations, the image can be easily recognized by the machine. The corrosion operation transforms the white noise in the middle edge of the image, and then transforms the three-channel image into gray image. Finally, the background and content are separated by adaptive threshold segmentation
# Image edge processing -- corrosion
fgmask = cv2.erode(bg, self.skinkernel, iterations=1)
# Do an "and" operation between the original image and the corroded image
bitwise_and = cv2.bitwise_and(frame, frame, mask=fgmask)
# Grayscale processing
gray = cv2.cvtColor(bitwise_and, cv2.COLOR_BGR2GRAY)
# Gaussian filtering
blur = cv2.GaussianBlur(gray, (self.blurValue, self.blurValue), 2)
cv2.imshow('GaussianBlur', blur)
# use adaptiveThreshold
thresh = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11.2)
The general purpose of image thresholding is to share the target region and background region from the grayscale image
cv2.imshow('thresh', thresh)
Ges = cv2.resize(thresh, (100.100))
Copy the code
The processed picture:
Data to enhance
Data enhancement
Data augmentation, which is characterized by slight perturbations or changes in training data, can increase the model’s generalization capability by adding training data and increased robustness by adding noise data. The main data enhancement methods are: Flip transform flip, random crop, Color jittering, Shift, Scale, contrast, noise, rotation transform/reflection transform (Rotation/Reflection), etc.
Data enhancement operations have the following aspects:
Image cutting: generate a rectangular frame smaller than the image size, cut the image randomly, and finally take the image in the rectangular frame as training data.
Image flip: Flip the image left and right.
Image whitening: whiten an image, that is, normalize the image into a Gaussian(0,1) distribution.
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_datagen = ImageDataGenerator(
rescale=1. / 255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
train_dir = r'Gesture_predict'
validation_dir = r'Gesture_train'
# train_img_list, test_img_list, train_lable_list, test_lable_list
train_datagen.fit(train_img_list)
train_generator = train_datagen.flow(train_img_list,train_lable_list,batch_size=10)
validation_generator = test_datagen.flow(test_img_list,test_lable_list,batch_size=10)
Copy the code
Feature extraction
def extarct_features(flag, sample_count):
features = np.zeros(shape=(sample_count, 3, 3, 512))
labels = np.zeros(shape=(sample_count,5))
if flag=="train":
generator = datagen.flow(train_img_list,train_lable_list,batch_size=20)
else:
generator = datagen.flow(test_img_list,test_lable_list,batch_size=20)
i = 0
for inputs_batch, labels_batch in generator:
if i * (batch_size+1 )>= sample_count:
break
features_batch = conv_base.predict(inputs_batch)
features[i * batch_size: (i + 1) * batch_size] = features_batch
labels[i * batch_size: (i + 1) * batch_size] = labels_batch
i += 1
return features, labels
Copy the code
Model to improve
The next steps are some improvements from a model perspective such as Batch normalization and weight decay. I’ve experimented with three improvements here, which I’ll describe in sequence.
Weight decay: Adding regularization term for objective function to limit the number of weight parameters is a method to prevent overfitting. This method is actually l2 regularization method in machine learning, but the old bottle of new wine is renamed weight decay in neural network
Dropout: During each training, make some of the feature detectors stop working, that is, make the neurons inactive at a certain rate to prevent overfitting and improve generalization
Batch normalization: Batch normalization of each layer of the neural network’s input data regularization processing and is beneficial to make data distribution more uniform, so there will be no all the data will lead to the activation of neurons, or all the data will not lead to the activation of neurons, which is a method of data standardization can improve fitting ability of the model
Structure diagram of the model
# Batch normalization processing
model.add(BatchNormalization())
# activate relu
model.add(Activation('relu', name='activation_1'))
# Convolution layer 2, number 32, size 3*3, filling mode VALID, step size default 1*1
model.add(Convolution2D(
filters=32,
kernel_size=(3.3),
name='conv2d_2'
))
model.add(BatchNormalization())
model.add(Activation('relu', name='activation_2'))
Pooled layer, size 2*2, step size 2*2, fill method valid
model.add(MaxPool2D(
pool_size=(2.2),
strides=(2.2),
padding='valid',
name='max_pooling2d_1'
))
# Dropout layer, inactivation coefficient 0.5
model.add(Dropout(0.5, name='dropout_1'))
# Convert to a one-dimensional matrix
model.add(Flatten(name='flatten_1'))
# Full connection layer, 128 neurons
model.add(Dense(128, name='dense_1'))
model.add(BatchNormalization())
model.add(Activation('relu', name='activation_3'))
# model. The add (Dropout (0.5, name = 'dropout_2'))
# Classification layer, L2 regular optimization
model.add(Dense(self.categories,
# kernel_regularizer = regularizers. L2 (0.01),
name='dense_2'))
Sofomax is activated for the classification layer
model.add(Activation('softmax', name='activation_4'))
Copy the code
Results analysis:
Result analysis: we observed the training curve and verification curve, and it was obvious that the improvement had a good effect. It not only made the loss in the training process more stable, but also improved the accuracy of verification set to more than 90%, and the improvement effect was very obvious. It indicates that image enhancement does improve the model generalization ability and robustness by increasing the amount of data in the training set, and also improves the accuracy by about 10%. Therefore, data enhancement does play a great role. However, we are not satisfied with the recognition accuracy of around 80%.
draft
- Image processing: extract the main content of the photo, remove the noise and after all
- Data enhancement: Improved data generalization capability