Zero, data sets and tools

The portal data set uses the gesture recognition images used to build DNN. Extraction code: PCOI

I. Principle and framework

The convolutional network model of this paper is based onLeNet-5, that is, $input\to conv1\to relu\to maxpool1\to conv2\to relu\to maxpool2\to fc3\to fc4\to output(softmax)$ Forward propagation write well, back propagation with the framework to achieve, is now the most mainstream way, fast and not easy to make mistakes. I used TensorFlow deep learning framework to build CNN. First, I will give a brief explanationtf.nnThe use of a function in.

Convolution layer:tf.nn.conv2d(x, w, strides=[1,stride,stride,1], padding="SAME/VALID")
Activation:tf.nn.relu(x)
Pooling layer:tf.nn.max_pool(x, ksize=[1,f,f,1], strides=[1,stride,stride,1], padding="SAME/VALID)
A:tf.contrib.layers.flatten(x)
Full connection layer:tf.contrib.layers.fully_connected(x, num_neuron, activation_fn=None/tf.nn.relu)

Ii. Image preprocessing (normalization and independent thermal coding)

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import h5py
import cnn_utils

train_x, train_y, test_x, test_y, classes = cnn_utils.load_dataset()
index = 5
plt.imshow(train_x[index])
print("y =", np.squeeze(train_y[: ,index]))
print("train_x.shape:", train_x.shape)
Copy the code

# Image normalization
train_x1 = train_x / 255
test_x1 = test_x / 255

# tag unique heat encoding
train_y1 = cnn_utils .convert_to_one_hot(train_y, 6).T
test_y1 = cnn_utils .convert_to_one_hot(test_y, 6).T

print(Sample number of training set:, train_x1.shape[0])
print(Sample Size of test set:, test_x1.shape[0])
print("Training set image:", train_x1.shape)
print("Training Set tag:", train_y1.shape)
print("Test set image:", test_x1.shape)
print("Test Set tag", test_y1.shape)
Copy the code

Sample number of training set:1080Sample number of test set:120Image of training set: (1080.64.64.3Training Set tag: (1080.6) Test Set image: (120.64.64.3) Test set tag (120.6)
Copy the code

Note: Unlike deep neural networks, there is no need to dimensionalize RGB 3d images and stack multiple columns! Only image normalization is required and label independent thermal coding is required.

3. CNN building process function

# placeholder
def create_placeholder(n_H0, n_W0, n_C0, n_y) :
    X = tf.placeholder(tf.float32, [None, n_H0, n_W0, n_C0])
    Y = tf.placeholder(tf.float32, [None, n_y])
    
    return X, Y

Initialize the filter
def initialize_parameters() :
    # [n_H, n_W, n_C, num] specifies the number of high and wide channels.
    W1 = tf.Variable(tf.random_normal([4.4.3.8]))
    W2 = tf.Variable(tf.random_normal([2.2.8.16]))
    parameters = {
        "W1": W1,
        "W2": W2
    }
    return parameters

# CNN Forward Spread
def forward_propagation(X, parameters) :
    W1 = parameters["W1"]
    W2 = parameters["W2"]
    # conv1
    Z1 = tf.nn.conv2d(X, W1, strides=[1.1.1.1], padding="SAME")
    A1 = tf.nn.relu(Z1)
    P1 = tf.nn.max_pool(A1, ksize=[1.8.8.1], strides=[1.8.8.1], padding="SAME")
    # conv2
    Z2 = tf.nn.conv2d(P1, W2, strides=[1.1.1.1], padding="SAME")
    A2 = tf.nn.relu(Z2)
    P2 = tf.nn.max_pool(A2, ksize=[1.4.4.1], strides=[1.4.4.1], padding="SAME")
    # FC
    P = tf.contrib.layers.flatten(P2)
    Z3 = tf.contrib.layers.fully_connected(P, 120, activation_fn=tf.nn.relu)
    Z4 = tf.contrib.layers.fully_connected(Z3, 6, activation_fn=None)
    
    return Z4

# Calculate the cost
def compute_cost(Z4, Y) :
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=Z4, labels=Y))
    
    return cost
Copy the code

Iv. Integrate CNN model and run it

# model
def model(train_x, train_y, test_x, test_y, lr=0.01, num_epoch=1000, minibatch_size=64, print_cost=True, isPlot=True) :
    seed = 3
    costs = []
    (m, n_H0, n_W0, n_C0) = train_x.shape
    n_y = train_y.shape[1]
    X, Y = create_placeholder(n_H0, n_W0, n_C0, n_y)
    parameters = initialize_parameters()
    Z4 = forward_propagation(X, parameters)
    cost = compute_cost(Z4, Y)
    optimizer = tf.train.AdamOptimizer(lr).minimize(cost)
    init = tf.global_variables_initializer()
    with tf.Session() as session:
        session.run(init)
        for epoch in range(num_epoch):
            epoch_cost = 0
            num_minibatches = int(m / minibatch_size)
            seed = seed + 1
            minibatches = cnn_utils.random_mini_batches(train_x, train_y, minibatch_size, seed)
            for minibatch in minibatches:
                (minibatch_x, minibatch_y) = minibatch
                _ , minibatch_cost = session.run([optimizer,cost], feed_dict={X:minibatch_x, Y:minibatch_y})
                epoch_cost += minibatch_cost / num_minibatches
            costs.append(epoch_cost)
            if print_cost:
                if epoch % 10= =0:
                    print("epoch =", epoch, "epoch_cost =", epoch_cost)
        if isPlot:
            plt.plot(np.squeeze(costs))
            plt.title("learning_rate =" + str(lr))
            plt.xlabel("epoch")
            plt.ylabel("cost")
            plt.show()
        parameters = session.run(parameters)
        
        correct_prediction = tf.equal(tf.argmax(Z4, axis=1), tf.argmax(Y, axis=1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        print("Training set accuracy:", accuracy.eval({X:train_x, Y:train_y}))
        print("Test set Accuracy:", accuracy.eval({X:test_x, Y:test_y}))
        
        return parameters
Copy the code

Run the network

parameters = model(train_x1, train_y1, test_x1, test_y1, lr=0.01, num_epoch=1000)
Copy the code

The results

epoch = 0 epoch_cost = 6.046239018440247
epoch = 10 epoch_cost = 0.8401443846523762
epoch = 20 epoch_cost = 0.5430787391960621
epoch = 30 epoch_cost = 0.37734642811119556
epoch = 40 epoch_cost = 0.1890146671794355
epoch = 50 epoch_cost = 0.08683486701920629
epoch = 60 epoch_cost = 0.08065505273407325
epoch = 70 epoch_cost = 0.025464061880484223
epoch = 80 epoch_cost = 0.010602715017739683
epoch = 90 epoch_cost = 0.0036894527947879396
epoch = 100 epoch_cost = 0.0024794578494038433
epoch = 110 epoch_cost = 0.0021597172526526265
epoch = 120 epoch_cost = 0.0015165473596425727
epoch = 130 epoch_cost = 0.0012093357254343573
epoch = 140 epoch_cost = 0.0010065358073916286
epoch = 150 epoch_cost = 0.0008580533994972939
epoch = 160 epoch_cost = 0.0006731234116159612
epoch = 170 epoch_cost = 0.000575631856918335
epoch = 180 epoch_cost = 0.0005104850497446023
epoch = 190 epoch_cost = 0.00043178290434298106
epoch = 200 epoch_cost = 0.0003564941771401209
epoch = 210 epoch_cost = 0.0003232461576772039
epoch = 220 epoch_cost = 0.0002740809841270675
epoch = 230 epoch_cost = 0.00025000690584420227
epoch = 240 epoch_cost = 0.00021687703610950848
epoch = 250 epoch_cost = 0.0001856840840446239
epoch = 260 epoch_cost = 0.00016450545490442892
epoch = 270 epoch_cost = 0.00014440709355767467
epoch = 280 epoch_cost = 0.00012826649935959722
epoch = 290 epoch_cost = 0.00011342956258886261
epoch = 300 epoch_cost = 0.0001002715393951803
epoch = 310 epoch_cost = 9.134015908784932 e-05
epoch = 320 epoch_cost = 8.081822375061165 e-05
epoch = 330 epoch_cost = 7.289223390216648 e-05
epoch = 340 epoch_cost = 6.377049851380434 e-05
epoch = 350 epoch_cost = 5.675839747709688 e-05
epoch = 360 epoch_cost = 5.180321227271634 e-05
epoch = 370 epoch_cost = 4.63823556628995 e-05
epoch = 380 epoch_cost = 4.153193617639772 e-05
epoch = 390 epoch_cost = 3.71186773691079 e-05
epoch = 400 epoch_cost = 3.3484795721960836 e-05
epoch = 410 epoch_cost = 3.103026369899453 e-05
epoch = 420 epoch_cost = 2.7161293019162258 e-05
epoch = 430 epoch_cost = 2.424556680580281 e-05
epoch = 440 epoch_cost = 2.1611446754832286 e-05
epoch = 450 epoch_cost = 1.9845915744554077 e-05
epoch = 460 epoch_cost = 1.8085945555412763 e-05
epoch = 470 epoch_cost = 1.5999822608137038 e-05
epoch = 480 epoch_cost = 1.4649548518264055 e-05
epoch = 490 epoch_cost = 1.3234309221843432 e-05
epoch = 500 epoch_cost = 1.2474629897951672 e-05
epoch = 510 epoch_cost = 1.0769706705104909 e-05
epoch = 520 epoch_cost = 9.782952247405774 e-06
epoch = 530 epoch_cost = 8.88926402353718 e-06
epoch = 540 epoch_cost = 7.897974711568168 e-06
epoch = 550 epoch_cost = 7.140903875324511 e-06
epoch = 560 epoch_cost = 6.437216967469794 e-06
epoch = 570 epoch_cost = 5.833738470073513 e-06
epoch = 580 epoch_cost = 5.1959334257389855 e-06
epoch = 590 epoch_cost = 4.821698055934576 e-06
epoch = 600 epoch_cost = 4.539523985158667 e-06
epoch = 610 epoch_cost = 3.980700640227042 e-06
epoch = 620 epoch_cost = 3.481204629451895 e-06
epoch = 630 epoch_cost = 3.211789177726132 e-06
epoch = 640 epoch_cost = 2.829953558602938 e-06
epoch = 650 epoch_cost = 2.5407657346931956 e-06
epoch = 660 epoch_cost = 2.3277445251324025 e-06
epoch = 670 epoch_cost = 2.1203611453302074 e-06
epoch = 680 epoch_cost = 1.9698040283344653 e-06
epoch = 690 epoch_cost = 1.773695679219145 e-06
epoch = 700 epoch_cost = 1.5777209014800064 e-06
epoch = 710 epoch_cost = 1.4154058298743166 e-06
epoch = 720 epoch_cost = 1.2963970377199985 e-06
epoch = 730 epoch_cost = 1.183275156080299 e-06
epoch = 740 epoch_cost = 1.0764728379797361 e-06
epoch = 750 epoch_cost = 9.585948674839528 e-07
epoch = 760 epoch_cost = 8.798984154623213 e-07
epoch = 770 epoch_cost = 7.984745433731177 e-07
epoch = 780 epoch_cost = 7.096832543851406 e-07
epoch = 790 epoch_cost = 6.668258656361559 e-07
epoch = 800 epoch_cost = 5.947650230098134 e-07
epoch = 810 epoch_cost = 5.503942883677837 e-07
epoch = 820 epoch_cost = 4.884115956116375 e-07
epoch = 830 epoch_cost = 4.476663146846249 e-07
epoch = 840 epoch_cost = 4.0949880464324906 e-07
epoch = 850 epoch_cost = 3.703167985591449 e-07
epoch = 860 epoch_cost = 3.326482138632514 e-07
epoch = 870 epoch_cost = 3.079016703821935 e-07
epoch = 880 epoch_cost = 2.7876461317077883 e-07
epoch = 890 epoch_cost = 2.5039257511849655 e-07
epoch = 900 epoch_cost = 2.313004827669829 e-07
epoch = 910 epoch_cost = 2.083001673369722 e-07
epoch = 920 epoch_cost = 1.8619790820295634 e-07
epoch = 930 epoch_cost = 1.7819851949596455 e-07
epoch = 940 epoch_cost = 1.5779258788484185 e-07
epoch = 950 epoch_cost = 1.4289144134593812 e-07
epoch = 960 epoch_cost = 1.29137815108038 e-07
epoch = 970 epoch_cost = 1.1575005887110024 e-07
epoch = 980 epoch_cost = 1.0703554642610413 e-07
epoch = 990 epoch_cost = 9.53940206827042 e-08Accuracy of training set:1.0Test set Accuracy:0.875
Copy the code

Acc =100% on the training set and 87.5% on the test set, which is already a good training result. Considering that there is a 12.5% difference between the training set and the test set, which is an error type with large variance, over-fitting can be prevented through regularization and other methods. Therefore, methods such as L2 regularization, dropout and data enhancement can be considered for further improvement.

Five, the prediction

It is the same as the prediction principle when writing DNN. The dimension of input image is modified correctly, Softmax result is obtained by forward propagation, argmax function locates the position of maximum probability, and the calculation accuracy is compared with the actual label.

index = 999
 
x = tf.placeholder(tf.float32, [None.64.64.3])
plt.imshow(train_x[index])
image = train_x[index].reshape(1.64.64.3)
print(image.shape)
init = tf.global_variables_initializer()

with tf.Session() as session: 
    session.run(init)
    image_prediction = session.run(forward_propagation(x, parameters), feed_dict={x: image})
    prediction_label = np.squeeze(session.run(tf.argmax(image_prediction))) Argmax takes the maximum position of the vector after softmax classification
    print("prediction_label:", prediction_label)
    true_label = np.squeeze(train_y[: ,index])
    print("true_label:", true_label)
    if prediction_label == true_label:
        print("Correct prediction!")
    else:
        print("Wrong prediction!")
Copy the code

[Some bugs need to be fixed when writing the code for prediction]

FailedPreconditionError: Attempting to use uninitialized value fully_connected_48/biases
	 [[{{node fully_connected_48/biases/read}}]]
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Convolutional neural network realizes gesture image recognition

Zero, data sets and tools

I. Principle and framework

Ii. Image preprocessing (normalization and independent thermal coding)

3. CNN building process function

Iv. Integrate CNN model and run it

Five, the prediction

Convolutional neural network realizes gesture image recognition

Zero, data sets and tools

I. Principle and framework

Ii. Image preprocessing (normalization and independent thermal coding)

3. CNN building process function

Iv. Integrate CNN model and run it

Five, the prediction

Related Posts

【 Image Classification 】 Actual Practice — Using VGG16 to achieve the classification of plant seedlings (Pytroch)

Product recommendation Algorithm based on User session interest –DSIN Paper Interpretation

Plotly -express-10-plotly Implementation line diagram