This is the third day of my participation in the August Text Challenge.More challenges in August

Gesture recognition system based on Opencv+ Keras

Gesture recognition system based on Opencv+ Keras

technology

Python3.6 + opencv + keras + numpy + PIL

The image processing

Through the image corrosion, gray processing, adaptive threshold segmentation and other operations, the image can be easily recognized by the machine. The corrosion operation transforms the white noise in the middle edge of the image, and then transforms the three-channel image into gray image. Finally, the background and content are separated by adaptive threshold segmentation

   # Image edge processing -- corrosion
   fgmask = cv2.erode(bg, self.skinkernel, iterations=1)
   # Do an "and" operation between the original image and the corroded image
   bitwise_and = cv2.bitwise_and(frame, frame, mask=fgmask)
   # Grayscale processing
   gray = cv2.cvtColor(bitwise_and, cv2.COLOR_BGR2GRAY)
   # Gaussian filtering
   blur = cv2.GaussianBlur(gray, (self.blurValue, self.blurValue), 2)
   cv2.imshow('GaussianBlur', blur)
   # use adaptiveThreshold
   thresh = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11.2)
  The general purpose of image thresholding is to share the target region and background region from the grayscale image
   cv2.imshow('thresh', thresh)
   Ges = cv2.resize(thresh, (100.100))
         
Copy the code

The processed picture:

Data to enhance

Data enhancement

Data augmentation, which is characterized by slight perturbations or changes in training data, can increase the model’s generalization capability by adding training data and increased robustness by adding noise data. The main data enhancement methods are: Flip transform flip, random crop, Color jittering, Shift, Scale, contrast, noise, rotation transform/reflection transform (Rotation/Reflection), etc.

Data enhancement operations have the following aspects:

Image cutting: generate a rectangular frame smaller than the image size, cut the image randomly, and finally take the image in the rectangular frame as training data.

Image flip: Flip the image left and right.

Image whitening: whiten an image, that is, normalize the image into a Gaussian(0,1) distribution.

  test_datagen = ImageDataGenerator(rescale=1. / 255)
        train_datagen = ImageDataGenerator(
            rescale=1. / 255,
            rotation_range=40,
            width_shift_range=0.2,
            height_shift_range=0.2,
            shear_range=0.2,
            zoom_range=0.2,
            horizontal_flip=True)

        train_dir = r'Gesture_predict'
        validation_dir = r'Gesture_train'
        # train_img_list, test_img_list, train_lable_list, test_lable_list
        train_datagen.fit(train_img_list)
        train_generator = train_datagen.flow(train_img_list,train_lable_list,batch_size=10)
        validation_generator = test_datagen.flow(test_img_list,test_lable_list,batch_size=10)

Copy the code

Feature extraction

        def extarct_features(flag, sample_count):
            features = np.zeros(shape=(sample_count, 3, 3, 512))
            labels = np.zeros(shape=(sample_count,5))
            if flag=="train":
                generator = datagen.flow(train_img_list,train_lable_list,batch_size=20)
            else:
                generator = datagen.flow(test_img_list,test_lable_list,batch_size=20)
            i = 0
            for inputs_batch, labels_batch in generator:
                if i * (batch_size+1 )>= sample_count:
                    break
                features_batch = conv_base.predict(inputs_batch)
                features[i * batch_size: (i + 1) * batch_size] = features_batch
                labels[i * batch_size: (i + 1) * batch_size] = labels_batch
                i += 1


            return features, labels

Copy the code

Model to improve

The next steps are some improvements from a model perspective such as Batch normalization and weight decay. I’ve experimented with three improvements here, which I’ll describe in sequence.

Weight decay: Adding regularization term for objective function to limit the number of weight parameters is a method to prevent overfitting. This method is actually l2 regularization method in machine learning, but the old bottle of new wine is renamed weight decay in neural network

Dropout: During each training, make some of the feature detectors stop working, that is, make the neurons inactive at a certain rate to prevent overfitting and improve generalization

Batch normalization: Batch normalization of each layer of the neural network’s input data regularization processing and is beneficial to make data distribution more uniform, so there will be no all the data will lead to the activation of neurons, or all the data will not lead to the activation of neurons, which is a method of data standardization can improve fitting ability of the model

Structure diagram of the model

 # Batch normalization processing
        model.add(BatchNormalization())
        # activate relu
        model.add(Activation('relu', name='activation_1'))
        # Convolution layer 2, number 32, size 3*3, filling mode VALID, step size default 1*1
        model.add(Convolution2D(
            filters=32,
            kernel_size=(3.3),
            name='conv2d_2'
        ))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='activation_2'))

        Pooled layer, size 2*2, step size 2*2, fill method valid
        model.add(MaxPool2D(
            pool_size=(2.2),
            strides=(2.2),
            padding='valid',
            name='max_pooling2d_1'
        ))
        # Dropout layer, inactivation coefficient 0.5
        model.add(Dropout(0.5, name='dropout_1'))
        # Convert to a one-dimensional matrix
        model.add(Flatten(name='flatten_1'))
        # Full connection layer, 128 neurons
        model.add(Dense(128, name='dense_1'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='activation_3'))
        # model. The add (Dropout (0.5, name = 'dropout_2'))

        # Classification layer, L2 regular optimization
        model.add(Dense(self.categories,
                        # kernel_regularizer = regularizers. L2 (0.01),
                        name='dense_2'))
        Sofomax is activated for the classification layer
        model.add(Activation('softmax', name='activation_4'))
Copy the code

Results analysis:

Result analysis: we observed the training curve and verification curve, and it was obvious that the improvement had a good effect. It not only made the loss in the training process more stable, but also improved the accuracy of verification set to more than 90%, and the improvement effect was very obvious. It indicates that image enhancement does improve the model generalization ability and robustness by increasing the amount of data in the training set, and also improves the accuracy by about 10%. Therefore, data enhancement does play a great role. However, we are not satisfied with the recognition accuracy of around 80%.

draft

  1. Image processing: extract the main content of the photo, remove the noise and after all
  2. Data enhancement: Improved data generalization capability