Definition:
In short, Convolutional Neural Networks are a deep learning model or multi-layer perceptron similar to artificial Neural Networks that are commonly used to analyze visual images. The founder of convolutional neural networks is renowned computer scientist Yann LeCun, now working at Facebook, who was the first person to use convolutional neural networks to solve handwritten number problems on MNIST data sets.
Yann LeCunn
The emergence of convolutional neural networks is inspired by biological processes, because the pattern of connections between neurons resembles the tissue of an animal’s visual cortex.
The visual structure of the human brain
Individual cortical neurons respond to stimuli only in a field-limited area called the receptive field, which is partially overlapped by different neurons, allowing them to cover the entire field of vision.
Computer vision and human vision
As the figure above shows, it is impossible to talk about any kind of neural network without mentioning a little neuroscience and knowledge about the human body (especially the brain) and its functions, which has been a major source of inspiration for the creation of various deep learning models.
Architecture of convolutional neural network:
Convolutional neural network architecture
As shown in the figure above, convolutional neural network architecture is very similar to conventional artificial neural network architecture, especially in the last layer of the network, i.e., full connection. In addition, it is also noted that convolutional neural networks can accept multiple feature graphs as inputs instead of vectors.
Next, let’s explore the basic components of convolutional neural network and related mathematical operation process, and visualize and classify images according to features and attributes learned in the training process.
Input Layer | Input Layer:
The input layer is mainly N × M ×3 RGB images, which is different from artificial neural network, whose input is n×1 dimensional vector.
RGB image
Convolution Layer | Convolution Layer:
In the convolution layer, the dot product between the region of the input image and the weight matrix of the filter is calculated, and the result is taken as the output of the layer. The filter will slide across the entire image, repeating the same dot product. Two things to note here:
- The filter must have the same number of channels as the input image;
- The deeper the network, the more filters are used; The more filters you have, the more edge and feature detection you can get.
Forward convolution
Output size of convolution layer:
Output width:
Output height:
Among them:
- W: Width of input image
- H: Enter the height of the image
- Fw: indicates the width of the filter or kernel
- Fh: height of the filter
- P: fill
- S: Move your stride
The number of channels output by the convolution layer is equal to the number of filters used during the convolution operation.
Why convolution?
Sometimes you might ask yourself, why use convolution in the first place? Why not expand the input image matrix from the beginning? To give the answer here, we end up with a lot of parameters that need training, and most people don’t have the ability to solve computationally expensive tasks in the fastest way. In addition, because convolutional neural networks have fewer parameters, over-fitting can be avoided.
Pooling Layer | Pooling Layer:
Currently, there are two widely used pooling operations — average pooling and Max pooling, of which maximum pooling is the most frequently used operation and its effect is generally better than average pooling. Pooling layer is used to reduce the dimension of feature space on the convolutional neural network, but does not reduce the depth. When using the maximum pooling layer, the maximum number of input regions is used, and when using average pooling, the average of the input regions is used.
The biggest pooling
Why pool?
One of the core goals of the pooling layer is to provide spatial variance, meaning that you or the machine will be able to recognize an object even if its appearance changes in some way. See Yann LeCunn’s article for more on the pooling layer.
The nonlinear Layer | Non – linearity Layer:
In nonlinear layers, ReLU activation functions are generally used instead of the traditional Sigmoid or Tan-H activation functions. The ReLU activation function returns a value of 0 for each negative value in the input image, and the same value for each positive value in the input image (see this article for a more in-depth explanation of the activation function).
ReLU activation function
}Fully Connected Layer:
In the fully connected layer, we flatten the output of the last convolution layer and connect each node in the current layer to another node in the next layer. Full connection layer is just another name for artificial neural network, as shown in the figure below. The operation in the full connection layer is exactly the same as that in the general artificial neural network:
Convolution layer unfolding
The connection layer
The layers and operations discussed above are the core components of each convolutional neural network. Now that the operations experienced by convolutional neural networks in forward propagation have been discussed, let’s jump to the operations experienced by convolutional neural networks in back propagation.
Back propagation | Backpropagation:
Full connection layer:
In full connection layer, the back propagation with any conventional artificial neural network are exactly the same, in the back propagation (used as the optimization algorithm of gradient descent), using the loss function of the partial derivative is about weight loss function derivative to update the parameters, which we will loss function derivative and activate the output multiplied, activate the output of derivative and the activation output multiplied, The derivative is the inactive output corresponding to the weight. The mathematical expression is as follows:
Illustration of backpropagation
After calculating the gradient, we subtract it from the initial weight to get the new optimization:
Among them:
- θ I + 1: weight of optimization
- θ I: initial weight
- α : learning rate
- ∇J (θ I) : Gradient of the loss function
Gradient descent
In the dynamic diagram below is the result of applying gradient descent to linear regression. It is clear from the figure that the smaller the cost function, the better the linear model fits the data.
Gradient descent is applied to linear regression
In addition, please note that the value of learning rate should be carefully selected. Too high learning rate may lead to the gradient exceeding the target minimum value, while too low learning rate may lead to slow convergence of network model.
Small learning rate and large learning rate
Partial derivatives are used extensively in all optimization tasks, whether in physics, economics or computer science. The partial derivative is mainly used to calculate the rate of change of the dependent variable F (x, y, z) with respect to one of its independent variables. For example, suppose you have a company’s shares, which shares will be based on a variety of factors (securities, political, sales revenue, etc.) the rise or fall, in this case by partial derivative, can you calculate how many stock affected by other factors remain unchanged, stock changes, the price will change too.
Pooling Layer | Pooling Layer:
In the maximum pooling feature layer, gradients propagate back only through the maximum values, so changing them slightly does not affect the output. In this process, we replace the maximum value before the maximum pooling operation with 1 and set all non-maximum values to zero, then use the chain rule to multiply the asymptotic variable by the previous quantity to get the new parameter value.
The pooling layer propagates back
Unlike the maximum pooling layer, in the average pooling layer the gradient is propagated through all the inputs (before the average merge).
Convolution Layer | Convolution Layer:
You might now ask yourself, if the forward propagation of the convolution layer is convolution, what is its back propagation? Fortunately, its backward propagation is also a convolution, so you don’t have to worry about learning new hard-to-master math operations.
The convolution layer propagates back
Among them:
- ∂hij: derivative of loss function
In short, the figure above shows how back propagation works in the convolution layer. Now assuming that you have a deep theoretical understanding of convolutional neural networks, let’s build our first convolutional neural network using TensorFlow.
TensorFlow implements convolutional neural network:
What is Tensorflow?
TensorFlow is an open source software library that uses data flow diagrams for numerical calculations. It was originally developed by the Google Brain team, Google’s machine intelligence research arm, for machine learning and deep neural network research.
What is a tensor?
A tensor is an organized multidimensional array in which the order of the tensors is the number of dimensions of the array needed to represent it.
Types of tensors
What is a computational graph?
Computational graph is a basic processing method in computational algebra, which is very effective in neural networks and other model derivation algorithms and software packages in machine learning. The basic idea in a computational graph is to express some model — such as a feedforward neural network, and a computational graph as a directed graph representing a sequence of computational steps. Each step in the sequence corresponds to a vertex in the computed graph, and each step corresponds to a simple operation that takes some input and produces some output based on its input. In the following diagram, we have two inputs w1 = x and w2 = y. This input will flow through the graph, where each node in the graph is a mathematical operation, giving us the following output:
- W3 = cosine of x, cosine trig operation
- W4 = sin (x), sine trig operation
- W5 = w3∙w4, multiplication operation
- W6 = w1 / w2, division
- W7 = W5 + w6, addition
Now that we know what a graph is, let’s build our own graph in TensorFlow.
Code:
# Import the deep learning library Import tensorflow as tf # Define our compuational graph W1 = tf.constant(5.0, Name = "x") W2 = tf. Constant (3.0, name = "y") W3 = tf.cos(W1, name = "cos") W4 = tf.sin(W2, name = "sin") W5 = tf.multiply(W3, W4, name = "mult") W6 = tf.divide(W1, W2, name = "div") W7 = tf.add(W5, W6, name = "add") # Open the session with tf.Session() as sess: cos = sess.run(W3) sin = sess.run(W4) mult = sess.run(W5) div = sess.run(W6) add = sess.run(W7) # Before running TensorBoard, make sure you have generated summary data in a log directory by creating a summary writer writer = tf.summary.FileWriter("./Desktop/ComputationGraph", sess.graph) # Once you have event files, run TensorBoard and provide the log directory # Command: tensorboard --logdir="path/to/logs"Copy the code
Visualization using Tensorboard:
What is Tensorboard?
TensorBoard is a set of Web applications for examining and understanding TensorFlow operations and graphics, which is one of the biggest advantages of Google’s TensorFlow over Facebook’s Pytorch.
The above code is visualized in Tensorboard
With a deep understanding of convolutional neural networks, TensorFlow and TensorBoard, let’s build our first convolutional neural network using MNIST data sets to recognize handwritten numbers.
MNIST data set
Our convolutional neural network model will be similar to lenet-5 architecture, consisting of convolution layer, maximum pooling layer and nonlinear operation layer.
Three-dimensional simulation of convolutional neural network
code:
# Import the deep learning library import tensorflow as tf import time # Import the MNIST dataset from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("/tmp/data/", One_hot =True) # Network inputs and outputs # The Network's input is a 28×28 dimensional input n = 28 m = 28 num_input = n * m # MNIST data input num_classes = 10 # MNIST total classes (0-9 digits) # tf Graph input X = tf.placeholder(tf.float32, [None, num_input]) Y = tf.placeholder(tf.float32, [None, num_classes]) # Storing the parameters of our LeNET-5 inspired Convolutional Neural Network weights = { "W_ij": tf.Variable(tf.random_normal([5, 5, 1, 32])), "W_jk": tf.Variable(tf.random_normal([5, 5, 32, 64])), "W_kl": tf.Variable(tf.random_normal([7 * 7 * 64, 1024])), "W_lm": tf.Variable(tf.random_normal([1024, num_classes])) } biases = { "b_ij": tf.Variable(tf.random_normal([32])), "b_jk": tf.Variable(tf.random_normal([64])), "b_kl": tf.Variable(tf.random_normal([1024])), "b_lm": tf.Variable(tf.random_normal([num_classes])) } # The hyper-parameters of our Convolutional Neural Network learning_rate = 1e-3 num_steps = 500 batch_size = 128 display_step = 10 def ConvolutionLayer(x, W, b, strides=1): # Convolution Layer x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') x = tf.nn.bias_add(x, b) return x def ReLU(x): # ReLU activation function return tf.nn.relu(x) def PoolingLayer(x, k=2, strides=2): # Max Pooling layer return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, strides, strides, 1], padding='SAME') def Softmax(x): # Softmax activation function for the CNN's final output return tf.nn.softmax(x) # Create model def ConvolutionalNeuralNetwork(x, weights, biases): MNIST data input is a 1-d row vector of 784 features (28×28 Pixels) # 0 x Channel] # Tensor input become 4-D: [Batch Size, Height, Width, Channel] x = tf.reshape(x, shape=[-1, 28, 28, 1]) # Convolution Layer Conv1 = ConvolutionLayer(x, weights["W_ij"], biases["b_ij"]) # Non-Linearity ReLU1 = ReLU(Conv1) # Max Pooling (down-sampling) Pool1 = PoolingLayer(ReLU1, k=2) # Convolution Layer Conv2 = ConvolutionLayer(Pool1, weights["W_jk"], biases["b_jk"]) # Non-Linearity ReLU2 = ReLU(Conv2) # Max Pooling (down-sampling) Pool2 = PoolingLayer(ReLU2, k=2) # Fully connected layer # Reshape conv2 output to fit fully connected layer input FC = tf.reshape(Pool2, [-1, weights["W_kl"].get_shape().as_list()[0]]) FC = tf.add(tf.matmul(FC, weights["W_kl"]), biases["b_kl"]) FC = ReLU(FC) # Output, class prediction output = tf.add(tf.matmul(FC, weights["W_lm"]), biases["b_lm"]) return output # Construct model logits = ConvolutionalNeuralNetwork(X, weights, biases) prediction = Softmax(logits) # Softamx cross entropy loss function loss_function = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=Y)) # Optimization using the Adam Gradient Descent optimizer optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) training_process = optimizer.minimize(loss_function) # Evaluate model correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # recording how the loss functio varies over time during training cost = tf.summary.scalar("cost", loss_function) training_accuracy = tf.summary.scalar("accuracy", accuracy) train_summary_op = tf.summary.merge([cost,training_accuracy]) train_writer = tf.summary.FileWriter("./Desktop/logs", graph=tf.get_default_graph()) # Initialize the variables (i.e. assign their default value) init = tf.global_variables_initializer() # Start training with tf.Session() as sess: # Run the initializer sess.run(init) start_time = time.time() for step in range(1, num_steps+1): batch_x, batch_y = mnist.train.next_batch(batch_size) # Run optimization op (backprop) sess.run(training_process, feed_dict={X: batch_x, Y: batch_y}) if step % display_step == 0 or step == 1: # Calculate batch loss and accuracy loss, acc, summary = sess.run([loss_function, accuracy, train_summary_op], feed_dict={X: batch_x, Y: batch_y}) train_writer.add_summary(summary, step) print("Step " + str(step) + ", Minibatch Loss= " + \ "{:.4f}".format(loss) + ", Training Accuracy= " + \ "{:.3f}".format(acc)) end_time = time.time() print("Time duration: " + str(int(end_time-start_time)) + " seconds") print("Optimization Finished!" ) # Calculate accuracy for 256 MNIST test images print("Testing Accuracy:", \ sess.run(accuracy, feed_dict={X: mnist.test.images[:256], Y: mnist.test.labels[:256]}))Copy the code
The above code is a bit verbose, but if you break it down paragraph by paragraph, it’s not too hard to read. After running the program, the corresponding result should be as follows:
Step 1, Minibatch Loss= 74470.4844, Training Accuracy= 0.117
Step 10, Minibatch Loss= 20529.4141, Training Accuracy= 0.250
Step 20, Minibatch Loss= 14074.7539, Training Accuracy= 0.531
Step 30, Minibatch Loss= 7168.9839, Training Accuracy= 0.586
Step 40, Minibatch Loss= 4781.1060, Training Accuracy= 0.703
Step 50, Minibatch Loss= 3281.0979, Training Accuracy= 0.766
Step 60, Minibatch Loss= 2701.2451, Training Accuracy= 0.781
Step 70, Minibatch Loss= 2478.7153, Training Accuracy= 0.773
Step 80, Minibatch Loss= 2312.8320, Training Accuracy= 0.820
Step 90, Minibatch Loss= 2143.0774, Training Accuracy= 0.852
Step 100, Minibatch Loss= 1373.9169, Training Accuracy= 0.852
Step 110, Minibatch Loss= 1852.9535, Training Accuracy= 0.852
Step 120, Minibatch Loss= 1845.3500, Training Accuracy= 0.891
Step 130, Minibatch Loss= 1677.2566, Training Accuracy= 0.844
Step 140, Minibatch Loss= 1683.3661, Training Accuracy= 0.875
Step 150, Minibatch Loss= 1859.3821, Training Accuracy= 0.836
Step 160, Minibatch Loss= 1495.4796, Training Accuracy= 0.859
Step 170, Minibatch Loss= 609.3800, Training Accuracy= 0.914
Step 180, Minibatch Loss= 1376.5054, Training Accuracy= 0.891
Step 190, Minibatch Loss= 1085.0363, Training Accuracy= 0.891
Step 200, Minibatch Loss= 1129.7145, Training Accuracy= 0.914
Step 210, Minibatch Loss= 1488.5452, Training Accuracy= 0.906
Step 220, Minibatch Loss= 584.5027, Training Accuracy= 0.930
Step 230, Minibatch Loss= 619.9744, Training Accuracy= 0.914
Step 240, Minibatch Loss= 1575.8933, Training Accuracy= 0.891
Step 250, Minibatch Loss= 1558.5853, Training Accuracy= 0.891
Step 260, Minibatch Loss= 375.0371, Training Accuracy= 0.922
Step 270, Minibatch Loss= 1568.0758, Training Accuracy= 0.859
Step 280, Minibatch Loss= 1172.9205, Training Accuracy= 0.914
Step 290, Minibatch Loss= 1023.5415, Training Accuracy= 0.914
Step 300, Minibatch Loss= 475.9756, Training Accuracy= 0.945
Step 310, Minibatch Loss= 488.8930, Training Accuracy= 0.961
Step 320, Minibatch Loss= 1105.7720, Training Accuracy= 0.914
Step 330, Minibatch Loss= 1111.8589, Training Accuracy= 0.906
Step 340, Minibatch Loss= 842.7805, Training Accuracy= 0.930
Step 350, Minibatch Loss= 1514.0153, Training Accuracy= 0.914
Step 360, Minibatch Loss= 1722.1812, Training Accuracy= 0.875
Step 370, Minibatch Loss= 681.6041, Training Accuracy= 0.891
Step 380, Minibatch Loss= 902.8599, Training Accuracy= 0.930
Step 390, Minibatch Loss= 714.1541, Training Accuracy= 0.930
Step 400, Minibatch Loss= 1654.8883, Training Accuracy= 0.914
Step 410, Minibatch Loss= 696.6915, Training Accuracy= 0.906
Step 420, Minibatch Loss= 536.7183, Training Accuracy= 0.914
Step 430, Minibatch Loss= 1405.9148, Training Accuracy= 0.891
Step 440, Minibatch Loss= 199.4781, Training Accuracy= 0.953
Step 450, Minibatch Loss= 438.3784, Training Accuracy= 0.938
Step 460, Minibatch Loss= 409.6419, Training Accuracy= 0.969
Step 470, Minibatch Loss= 503.1216, Training Accuracy= 0.930
Step 480, Minibatch Loss= 482.6476, Training Accuracy= 0.922
Step 490, Minibatch Loss= 767.3893, Training Accuracy= 0.922
Step 500, Minibatch Loss= 626.8249, Training Accuracy= 0.930
Time duration: 657 seconds
Optimization Finished!
Testing Accuracy: 0.9453125
Copy the code
To sum up, we have just completed the construction of the first convolutional neural network. As can be seen from the above results, the accuracy of the model has been greatly improved from the first step to the last step, but there is still much room for improvement of our convolutional neural network. Now let’s visualize the convolutional neural network model constructed in Tensorboard:
Visual convolutional neural network
Accuracy and loss assessment
conclusion:
Convolutional neural network is a powerful deep learning model with extensive applications and excellent performance. The use of convolutional neural networks only becomes more challenging as the data gets bigger and the problem becomes more complex.
Author: Lightning Blade
The original link
More technical dry goods please pay attention to the Cloud community Zhihu Organization Number:Ali Cloud Habitat Community – Zhihu
This article is from the cloud community partner “translation Group”, if you need to reprint, please contact the original author.