Mr. Ng’s Deeplearning.ai released its last course on Jan. 31. Recently, Wan Zhen from Chongqing University made a special course note for deep learning, which explained the key concepts and operation codes in detail from the three courses of neural network and deep learning foundation, improving the performance of deep neural network and convolutional neural network. This paper introduces the main contents of the three courses in a general way, and introduces the interesting knowledge points of each course theme.

In this note, Wan Zhen not only introduces the key points of each course, but also explains in detail the programming assignments for each course. In the first course, Neural Networks and Fundamentals of Deep Learning, the course notes not only provide the most basic Python and NumPy operation notes, but also derive from the most basic Logistic regression to the most general deep fully connected networks. Of course, the necessary loss function and back propagation method are also introduced. In the second course, the notes detail the skills and fundamentals needed to improve the performance of deep networks, such as initialization, regularization, and gradient tests that greatly improve model performance in practice, as well as common optimization methods such as general SGD, momentum, and adaptive learning rate methods. Finally, the second course focuses on TensorFlow, including the common functions of the framework and the actual process of building the network. The last chapter mainly records the convolutional neural network, including the basic convolution operation, residual network and target detection framework.

Below is a brief outline of the course notes and some details.


Neural networks and deep learning

This part corresponds to the first lesson of Ng’s deep learning course, which mainly introduces the necessary programming language and programming tools, and gradually introduces the implementation method of linear network, nonlinear network, hidden layer network to deep network, with detailed details and complete code. Through this part of the study, you will understand the structure of the neural network and data flow (prior to spread and back propagation), a nonlinear activation function and the hidden layers to study the effect of complex function, step by step, and know how to build the complete (arbitrary structure, custom) neural networks, experience the beauty of vectorization and modular programming ideas.


1.1 Python Basics and Numpy

The first section of this chapter shows you how to use Python’s Numpy toolkit, iPython Notebook, and other basic programming tools. Then I will introduce how to build neural networks with these tools, especially to understand the idea of vectoquantization for neural network computing and the use of Python broadcasts.


1.2 logistic regression

Section 2 describes how to build a logistic regression neural network classifier (image recognition network) with an accuracy of 70% to identify cats, and how to further improve the accuracy, as well as the process of updating the parameters of the partial derivative of the loss function. Particular emphasis is placed on using vectoquantization constructs rather than cyclic constructs, where possible, unless necessary (for example, when an iteration of the epoch must use cyclic constructs).

1.2.1 This section describes the Required Python tool package. 1.2.2 Describes the structure of the data set; 1.2.3 Introduce the macro architecture of the whole learning algorithm; 1.2.4 Describes the basic steps of constructing the algorithm; 1.2.5 and 1.2.6 Summarize the above contents for code implementation and conduct visual analysis; 1.2.7 Introduce how to train the neural network with your own data set; 1.2.8 Shows the complete code of logistic regression neural network.

In 1.2.4, the basic steps of the algorithm construction are as follows:

  • Define model structure;

  • Initialize model parameters;

  • Loop iteration structure:

  • Calculate the current loss function value (forward propagation)

  • Calculate the current gradient value (back propagation)

  • Update parameters (gradient descent)

Typically, the 1-3 pieces are built separately and then integrated into a single function, Model ().

1.2.5 The code implementation of model() is carried out, and the image of loss function and gradient is drawn.

Figure 1.2.3: Loss function

Figure 1.2.4: Comparison of learning curves of three different learning rates


1.3 Use hidden layers to classify plane data points

Section 3 describes how to add hidden layers to a neural network to classify flat data points. This section will teach you how back propagation works, how hidden layers play a role in capturing nonlinear relationships, and how to build auxiliary functions.

The key points include: using a single hidden layer to realize the binary classifier; Use nonlinear activation functions; Calculate the cross entropy loss; Forward and back propagation.

1.3.1 This section describes the required tool packages. 1.3.2 Introduce the composition of the data set (red and blue dots on the plane); 1.3.3 Introduce the classification results of the data set by logistic regression without hidden layer; 1.3.4 Introduce the implementation process of the complete model with hidden layer added and the classification of the data set; 1.3.5 shows the complete code.

The classification results of 1.3.3 are shown in the figure below:

Figure 1.3.3: Logistic regression

Architecture of neural network used in 1.3.4:


Figure 1.3.4: Neural network model

The method of constructing neural network in 1.3.4 is basically the same as that in 1.2.4, with emphasis on how to define the hidden layer structure and the use of nonlinear activation function. After the implementation of the code, the operation results are as follows:

Figure 1.3.6: Decision boundaries with hidden layer classifiers

Among them, after adding the hidden layer, the nonlinear activation function must be used, because the linear layer stack without the nonlinear activation function is meaningless and cannot increase the complexity and capacity of the model.


1.4 Build a complete deep neural network step by step

Section 4 covers the complete architecture of deep neural networks and how to build custom models. After completing this section, you will learn how to use ReLU activation functions to improve model performance, build deeper models (more than one hidden layer), and implement easy-to-use neural networks (modularization idea).

1.4.1 This section describes the required tool packages. 1.4.2 Overview of the task; 1.4.3 This section describes the initialization process of a Layer 2 network to a Layer L network. 1.4.4 Introduce the construction of forward propagation module, from linear forward propagation, linear + nonlinear activation forward propagation, to the forward propagation of layer L network; 1.4.5 Describes the loss function. 1.4.6 Introduce the construction of back propagation modules, including linear back propagation, linear + nonlinear activated back propagation, and back propagation of Layer L network; 1.4.7 shows the complete code of deep neural networks.

Figure 1.4.1: Task overview

Figure 1.4.3: Forward propagation and back propagation linear – ReLU – linear – sigmoid process diagram. The top indicates forward propagation, and the bottom indicates back propagation.


1.5 Image classification application of deep neural network

In the previous four sections, you have learned how to build a complete deep neural network step by step. Section 5 describes how to construct cat recognition classifier with deep neural network. Previously, in the logistic regression network, the identification accuracy rate can only reach 68%, but in the full deep network, the identification accuracy can reach 80%!

By the end of this section, you will have learned to: build a neural network of any structure using all the helper functions described earlier; The neural networks of different structures were tested and analyzed. Understand the benefits of building helper functions for building networks (compared to starting from scratch).

1.5.1 This section describes the required tool packages. 1.5.2 Introduction to data sets (cat vs. non-cat); 1.5.3 Introduce the model architecture, in which 2-layer and L-layer neural networks are constructed respectively; 1.5.4 Introduce the training and test results of layer 2 neural network; 1.5.5 Introduce the training and test results of layer 2 neural network; 1.5.6 Analysis of results; 1.5.7 Introduce how to train the classification model with your own images; 1.5.8 shows the complete code.

Among them, the operation results of the 2-layer neural network are as follows:

Figure 1.5.4: Loss function of 2-layer neural network

Running result:



Figure 1.5.5: Loss function of L-layer neural network

Running result:

By comparison, deeper networks helped improve identification accuracy (0.72 vs. 0.8; Floor 2 vs. Floor 5).

1.5.6 The factors affecting identification errors are summarized:

  • Cats in unconventional locations;

  • Cats are similar in color and background;

  • Unconventional cat colors and breeds;

  • Shooting Angle;

  • The brightness of the photo;

  • Cats are either too small or too large.

These identification errors may be related to the limitations of the fully connected network itself, including parameter sharing, overfitting tendency (number of parameters) and hierarchical characteristics, and these problems will be improved in the convolutional neural network.


2. Improve the performance of deep neural network

This section is corresponding to the second course of deeplearning. It focuses on describing the performance improvement methods of deeplearning in practice from hyperparameter tuning, parameter initialization methods such as random and Xavier, regularization methods such as Dropout and L2 norm, and 1 – and n-dimensional gradient tests. Of course, this section not only contains the course knowledge, but also shows the questions and answers and implementation assignments.

Optimization is also emphasized in this part of the course. Wan Zhen’s course notes introduced the basic first step method from the most basic method of the fastest descent to the small batch random gradient descent, and then explored the momentum method and adaptive learning rate method to obtain a better descent direction by using the historical gradient. It is worth noting that this note describes in detail the updating process and implementation code of Adam’s optimization method.

The last part of this course mainly shows the basic functions of TensorFlow and the actual process of constructing neural network.

Of interest in parameter initialization, regularization, and gradient testing are the mechanisms for He initialization and Dropout, which are discussed in detail below.

He Initialization (He et al., 2015) is determined by the name of the first author. If you know Xavier initializations, they are very similar, except that Xavier initializations use SQRT (1./layers_dim[L-1]) for W^ L, He initialization uses SQRT (2./layers_dims[L-1]). The following shows how to implement He initialization, and this section is the answer to the course assignment:

# GRADED FUNCTION: initialize_parameters_he
def initialize_parameters_he(layers_dims):
"""Arguments: layer_dims -- Python array (list) containing the size of each,→ layer. Returns: parameters -- python dictionary containing your parameters "W1",,,"b1",... ,"WL","bL": W1 -- weight matrix of shape (layers_dims[1],,→ layers_dims[0]) b1 -- bias vector of shape (Layers_dims [1], 1)... WL -- weight matrix of shape (layers_dims[L],,→ layers_dims[L]) bL -- bias vector of shape (layers_dims[L], 1)"""
np.random.seed(3)
parameters = {}
L = len(layers_dims) - 1 # integer representing the number of, and the layersfor l in range(1, L + 1):
### START CODE HERE ### (
2 lines of code)
parameters['W'+ layers_dims[L] = np.random. Randn (layers_dims[L],,→ layers_dims[L-1]) * nP.sqrt (2./layers_dims[L-1]) parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
### END CODE HERE ###
return parameters
Copy the code

Finally, the document summarizes the effects of the three types of initialization. They were all tested with the same number of iterations, the same hyperparameters, and the same network architecture, as shown below:


Dropout

Dropout is a very useful and successful technique in regularization methods. Although it has recently been found to cause some conflicts when used in conjunction with batch normalization (BN), it remains a powerful technique for controlling model overfitting. Typically, Dropout randomly deletes some neurons in order to train different neural network architectures in different batches.

Bagging is a technique for reducing generalization errors by combining multiple models. The main practice is to train several different models separately and then have all models vote on the output of the sample. Dropout can be thought of as a Bagging method that integrates a large number of deep neural networks, so it provides an inexpensive approach to Bagging integration to train and evaluate neural networks with a large amount of data.

Figure: Use Dropout in layer 1 and Layer 3.

In the forward propagation and reverse update of each batch, the probability of closing each neuron is 1-keep_PROb, and the closed neuron does not participate in the forward propagation calculation and parameter update. Every time we close some neurons, we actually modify the structure of the original model, so each iteration trains a different architecture, and the parameter updates are more focused on the neurons that are active. This regularization approach can be seen as an integration approach that integrates the different network architectures trained in each batch.

In the regularization method, the note also compares the effects of L2 regularization and Dropout:

In the following optimization methods, we are interested in Adam method, so we will also focus on describing this method below.


Adam

Adam algorithm is different from the traditional stochastic gradient descent. Random gradient descent maintains that a single learning rate (alpha) updates all weights, and the learning rate does not change during training. Adam designs independent adaptive learning rates for different parameters by calculating first-order moment estimation and second-order moment estimation of gradient.

The originator of Adam algorithm describes it as the set of advantages of two random gradient descent expansions, namely:

  • Adaptive gradient algorithms (AdaGrad) retain a learning rate for each parameter to improve performance on sparse gradients (i.e., natural language and computer vision problems).

  • Root Mean square propagation (RMSProp) adaptively retains the learning rate for each parameter based on the mean of the nearest order of weight gradient. This means that the algorithm has excellent performance for both transient and online problems.

Adam algorithm has the advantages of AdaGrad and RMSProp algorithm. Adam not only calculates the learning rate of adaptive parameters based on the mean value of first-order moments like RMSProp algorithm, but also makes full use of the mean value of second-order moments of gradient (i.e., the bias difference /uncentered variance). Specifically, the algorithm calculates an exponential moving average of the gradient, and the hyperparameters Beta1 and Beta2 control the decay rate of these moving averages.

The initial values of moving mean and beta1 and Beta2 are close to 1 (recommended values), so the deviation of the moment estimate is close to 0. The bias is enhanced by first calculating the biased estimate and then calculating the bias corrected estimate.

As summarized in this note, Adam’s computational update process can be divided into three parts:

1. Calculate an exponentially weighted average of the historical gradient and store it in the variable V (biased estimate), and then calculate V ^corrected (unbiased estimate after correction).

2. Calculate an exponentially weighted average of the square of the historical gradient and store it as a variable S (biased estimate), then calculate S ^corrected (modified unbiased estimate).

3. Update parameters based on the information in the previous two steps.

This update process can be expressed as:

Where t will calculate the number of iterations and updates of Adam, L is the number of levels, β_1 and β_1 are the weighted average worth hyperparameters of the control index, α is the learning rate, and ε is a small constant that avoids zero in the denominator.

Wan Zhen also gives the implementation code or job interpretation of Adam:

# GRADED FUNCTION: initialize_adam
def initialize_adam(parameters) :
"""
Initializes v and s as two python dictionaries with:
- keys: "dW1","db1",... ,"dWL","dbL"-values: numpy arrays of zeros of the same shape as,→ the corresponding gradients/ parameters.arguments: parameters -- python dictionary containing your parameters. parameters["W" + str(l)] = Wl
parameters["b" + str(l)] = bl Returns: V -- Python dictionary that will contain the weighted,→ Average of the gradient. V ["dW" + str(l)] = ...
v["db" + str(l)] = ... S -- Python dictionary that will contain the weighted,→ Average of the squared gradient. S ["dW" + str(l)] = ...
s["db" + str(l)] = ...
"""
L = len(parameters) // 2 # number of layers in the neural networks
v = {}
s = {}
# Initialize v, s. Input: "parameters". Outputs: "v, s".
for l in range(L):
### START CODE HERE ### (approx. 4 lines)
v["dW" + str(l+1)] = np.zeros(parameters["W" + str(l+1)].shape)
v["db" + str(l+1)] = np.zeros(parameters["b" + str(l+1)].shape)
s["dW" + str(l+1)] = np.zeros(parameters["W" + str(l+1)].shape)
s["db" + str(l+1)] = np.zeros(parameters["b" + str(l+1)].shape)
### END CODE HERE ###
return v, s
Copy the code

The parameter update process described above is implemented as follows:

# GRADED FUNCTION: update_parameters_with_adamDef def grads, v, s, t, learning_rate = 0.01, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8): L = len(parameters) // 2# number of layers in the neural networks
v_corrected = {} # Initializing first moment estimate, python, dictionary

s_corrected = {} # Initializing second moment estimate, python, and the dictionary# Perform Adam update on all parameters
for l in range(L):
# Moving average of the gradients. Inputs: "v, grads, beta1"., and the Output:"v".
v["dW" + str(l+1)] = beta1*v["dW"The + and - STR (l + 1)] + (1 - beta1) * grads ['dW' + str(l+1)]
v["db" + str(l+1)] = beta1*v["db" + str(l+1)]+(1-beta1)*grads['db' + str(l+1)]
# Compute bias-corrected first moment estimate. Inputs: "v, beta1, t". Output: "v_corrected".
v_corrected["dW" + str(l+1)] = v["dW" + str(l+1)]/(1-math.pow(beta1,t))
v_corrected["db" + str(l+1)] = v["db" + str(l+1)]/(1-math.pow(beta1,t))
# Moving average of the squared gradients. Inputs: "s, grads, beta2". Output: "s".
s["dW" + str(l+1)] = beta2*s["dW" + str(l+1)]+(1-beta2)*(grads['dW' + str(l+1)]**2)
s["db" + str(l+1)] = beta2*s["db" + str(l+1)]+(1-beta2)*(grads['db' + str(l+1)]**2)
# Compute bias-corrected second raw moment estimate. Inputs: "s, beta2, t". Output: "s_corrected".
s_corrected["dW" + str(l+1)] = s["dW" + str(l+1)]/(1-math.pow(beta2,t))
s_corrected["db" + str(l+1)] = s["db" + str(l+1)]/(1-math.pow(beta2,t))
# Update parameters. Inputs: "parameters, learning_rate, v_corrected, s_corrected, epsilon". Output: "parameters".
parameters["W" + str(l+1)] = parameters["W" + str(l+1)]-learning_rate * v_corrected["dW" + str(l+1)]/(np.sqrt(s_corrected["dW" + str(l+1)])+epsilon)
parameters["b" + str(l+1)] = parameters["b" + str(l+1)]-learning_rate * v_corrected["db" + str(l+1)]/(np.sqrt(s_corrected["db" + str(l+1)])+epsilon)

return parameters, v, s
Copy the code


This chapter also introduces and compares the advantages of these optimization methods:



TensorFlow

The last part focuses on the functions and practices of TensorFlow. TensorFlow is an open source software library that uses data flow graphs for numerical calculation. Tensor means that the data passed is a Tensor (multidimensional array), and Flow means that the computation diagram is used. A data flow graph describes mathematical operations as a directed graph of “nodes” and “edges”. “Nodes” are generally used to represent mathematical operations imposed, but can also represent the starting point for data input and the ending point for output, or the end point for reading/writing persistent variables. Edges represent the input/output relationship between nodes. These data edges can pass arrays of dynamically adjustable multidimensional data, known as tensors.

The notes in this section focus on the testing and implementation code for the course, such as the following simple placeholders built:

### START CODE HERE ### (approx. 2 lines)
X = tf.placeholder(tf.float32, [n_x, None], name = 'X')
Y = tf.placeholder(tf.float32, [n_y, None], name = 'Y')
### END CODE HERE ###
return X, Y


return parameters, v, s

Copy the code

Ways to define variables and constants:

A = tf.constant(2, tf.int16) b = tf.constant(4, tf.float32) g = tf.constant(shape=(2,2)) dtype=np.float32)) d = tf.Variable(2, tf.int16) e = tf.Variable(4, tf.float32) h = tf.zeros([11], Tf.int16) I = tf.ones([2,2], tf.zer32) k = tf.variable (tf.zeros([2,2], tf.zer32)) l = tf.variable (tf.zeros([5,6,5]) tf.float32))return parameters, v, s
Copy the code

To initialize parameters:

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers. xavier_Initializer (seed = 1)) b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())


return parameters, v, s
Copy the code

Running calculation diagram:

a = tf.constant(2, tf.int16) b = tf.constant(4, tf.float32) graph = tf.Graph() with graph.as_default(): A = tf.variable (8, tf.float32) b = tf.variable (tf.zeros([2,2], tf.float32)) tf.global_variables_initializer().run()print(f)
print(session.run(a))
print(session.run(b))

# output:

>>> <tf.Variable 'Variable_2:0' shape=() dtype=int32_ref>
>>> 8
>>> [[ 0. 0.]
>>> [ 0. 0.]]


return parameters, v, s
Copy the code


3. Convolutional neural networks

In lesson 4 with Andrew Ng we studied convolutional neural networks or CNN, and the problem set in this chapter is to use Numpy to implement a convolution layer and a pooling layer as well as feedforward and back propagation. The bags you can use include:

  • Numpy: a basic package for doing scientific calculations in Python.

  • Matplotlib: For drawing.

Here’s a list of the functions you’ll learn to implement:

1. Convolution function, including:

  • Zero padding

  • Convolve Window

  • Convolution forward

  • Convolution backward

2. Pooling functions, including:

  • Pooling forward

  • Create mask

  • Distribute Value

  • Pooling backward

The first job asks you to implement these functions bit by bit using Numpy, and the next job asks you to model them using the functions in TensorFlow.

In chapter 3, the convolutional neural network, pooling layer and back propagation of convolutional neural network are discussed. The part about convolutional neural networks is zero fill, one step convolution and feedforward convolutional networks; The pool layer, this is the forward pool layer; Back propagation of convolutional neural networks this part talks about convolutional layer back propagation and pooling layer back propagation.


3.1.3 Convolutional networks

Although programming frameworks make convolution easy to use, it is still the most difficult part of deep learning. A convolutional network converts input to output roughly like the following:




To help you further understand the convolution, xiaobian here to rewrite the operation of convolution.

In this section, we’re going to implement a one-step convolution, where you take a filter and you execute it at a single position of the input, and then the whole convolution is just dragging the filter over and over and over all the positions of the input. What we will do:

1: Get input data;

2: Use the filter to perform at each position of the input data;

3: Output another data (generally and input data size is not the same).

In computer vision, corresponds to each value in the matrix is a single pixel values, we use a 3 * 3 size of the filter to image, is in each position, with each value in the filter to the original matrix corresponding to the value of the position and then sum after plus a deviation value, and then drag the filter to the next position, This is the convolution, and the distance of each drag is called the step. In the first step exercise, you are going to implement a one-step convolution, where the one-step convolution is the convolution of a single real output using a filter at one position of the input data.

Exercise: implement conv_single_step() function

Code:

# GRADED FUNCTION: conv_single_step
def conv_single_step(a_slice_prev, W, b): """Apply one filter defined by parameters W on a single slice → (A_slice_prev) oftheoutputActivation ofthe previous layer. Arguments: a_slice_prev -- slice of input data of shape (f, f, n_C_prev) W -- Weight parameters contained in a window - matrix of shape (f, → F,n_C_prev) b -- Bias parameters contained in a window-matrix of shape (1, 1, → 1) Returns: Z -- a scalar value, result of convolving the sliding window (W, b) → Onaslicexoftheinputdata"""
### START CODE HERE ### ( 2 lines of code)
# Element-wise product between a_slice and W. Add bias. s = a_slice_prev * W + b
# Sum over all entries of the volume s
Z = np.sum(s)
### END CODE HERE ###
return Z


return parameters, v, s
Copy the code


Let’s talk a little bit about the convolutional neural network Forward pass.

Before we talked about how to convolve with one filter, in the prequel it was to convolve with multiple filters one by one and then stack the results layer by layer into a 3D structure.


For more interesting information, follow Andrew Ng’s course.


3.1.4 pooling layer

The pooling layer is fun because it is designed to reduce the size of the input data and reduce computation, and it also helps the feature detector to be more independent of the location information. Here are two types of pooling layers:

  • Maximum pooling layer

Slide a window of (f,f) size onto the input data, take the largest value in the window as the output and store this value in the data to be output.

  • Average pool layer

As the name shows, it requires the same (f,f) window to slide over the input data, then average the data in the window, and store the value in the output value.

There are no parameters to be trained on these pooling layers, but for window size F, you can try it yourself and choose the best one.


3.2 Convolutional neural network applications

In this part, Mr. Wu talked about TensorFlow model, creating placeholders, initial parameters, forward propagation, calculating loss and model.

Let’s talk about how to initialize parameters here. You need to initialize weights/filters W1 and W2 with tf.contrib.Layers.xavier_Initializer (seed=0). You don’t need to worry about bias values because you’ll soon see that the TensorFlow function solves that. Note that you only need the weight/filter of the initial Conv2D function and TensorFlow will automatically initialize the layers of the fully connected part. We’ll see more about that in future assignments.

Exercise: Implement initialize_Parameters (). Dimensions for each set of filters/weights have been provided. Remember to initialize a parameter W of shape [1,2,3,4] in TensorFlow with:

W = tf.get_variable("W", [1,2,3,4], initializer =...Copy the code

More information:

More information:# GRADED FUNCTION: initialize_parameters
def initialize_parameters(): """Initializes the neutralizing weight parameters to build a neural network with tensorflow.Theshapesare: W1: [4, 4, 3, 8] W2: [2, 2, 8, 16] Returns: parameters -- a dictionary of tensors containing W1, W2 """
tf.set_random_seed(1) # so that your → "random"numbersmatchours
   ### START CODE HERE ### (approx. 2 lines of code)
W1 = tf.get_variable("W1"Xavier_initializer (seed=0)) W2 = tf.get_variable(), [4, 4, 3, 8], initializer → =tf.contrib.layers. Xavier_initializer (seed=0))"W2", [2, 2, 8, 16], initializer → =tf.contrib.layers. Xavier_initializer (seed=0))### END CODE HERE ###
   parameters = {"W1": W1,
                 "W2": W2}
return parameters
Copy the code


3.3 Keras Tutorial: Happy House

This section describes an assignment (Happy Home) on how to use Keras to model, summarize, and test other very useful Keras functions with your own pictures. Here we focus on what a happy home is:


3.3.1 happy home

On your next vacation trip, you decide to spend a week with five of your friends from school. There’s a very convenient house near here to do a lot of things. But the most important benefit is that everyone has to be happy while in the house. So everyone who wants to enter the room has to provide their current happiness situation.

As a deep learning expert, to make sure the “happy” rule is firmly enforced, you need to build an algorithm to check if the person is happy by looking at the front door camera.

You’ve collected the photos of you and your friends at the front door, and the database has been tagged. 0 is unhappy, 1 is happy.

Run the following program to make the database more standardized and learn what it looks like.

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = → load_dataset()
# Normalize image vectors
X_train = X_train_orig/255.
X_test = X_test_orig/255.
# Reshape
Y_train = Y_train_orig.T
Y_test = Y_test_orig.T
print ("number of training examples = " + str(X_train.shape[0])) print ("number of test examples = " + str(X_test.shape[0])) print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape)) print ("Y_test shape: " + str(Y_test.shape))
#output
number of training examples = 600
number of test examples = 150
X_train shape: (600, 64, 64, 3)
Y_train shape: (600, 1)
X_test shape: (150, 64, 64, 3)
Y_test shape: (150, 1)
Copy the code

Details of the Happy Home database:

  • Image data size: (64, 64, 3)

  • Training data: 600

  • Test data: 150

Now go tackle the Happiness challenge!


3.4 Residual network

This part introduces the very deep deep neural network problems, how to build residual modules, residual connections, convolution blocks, etc., and combine them to build the first residual network model and experiment with their own pictures.

Residual networks can solve some problems of very deep neural networks, and we will focus on the problems of very deep neural networks here.

In recent years, neural networks have become deeper and deeper, with cutting edge networks ranging from a few layers to more than 100 layers.

The main advantage of deep neural networks is that they can represent very complex functions. It can also learn at different levels of abstraction, from marginal features (shallow) to complex features (deep). But using a deeper neural network doesn’t always work. One big drawback is that the gradient disappears when the model is trained: very deep networks often gradient down to zero very quickly. This will make the gradient descent unbearably slow, since it will only be updated a little at a time. More specifically, in gradient descent, as you propagate back from the last layer all the way to the first layer, each step is multiplied by the weight matrix, so that the gradient falls to zero at an exponential rate (or in some rare cases, the gradient explodes at an exponential rate).

During training, you can also see that the gradient on the front layer drops to zero very quickly.

You can now solve this problem with a residual network.


3.5 Test the vehicle with YOLOv2

This part tells: problem description, YOLO, model details, use a threshold to filter a class of scores, the biggest suppression, package filter, YOLO model was tested in the picture, define the class, the turning point and the image size, load a trained model, the transformation output model to one of the available boundary box tensor, filter box and run calculation chart for pictures.

Here’s what the problem is:

You’re building a self-driving car, and as an important part of it, you want to build a vehicle detection system that will shine a light on the road ahead every few seconds as you drive.

You’ve now collected all these pictures into a folder and you’ve framed every car you can find. Here’s an example:

P is how confident you are about what is circled, and C is what you think is circled.

If you want YOLO to recognize 80 categories, you can have C represent a number from 1 to 80, or c is a vector of length 80. I’ve already used the latter notation in the video. In this note, we use both, depending on which works better. In this exercise, you will learn how YOLO works and how to use the zone detector. Since YOLO training is very computatively intensive, we will use the weights we have trained.


3.6 Face recognition in happy home

In this one we can see: facialized recognition, face image encoded as a 128-dimensional human vector, coding using ConvNet, triplet loss, loading the trained model and applying the trained model.

Let’s focus on brief face recognition:

In face recognition, I give you two pictures and you have to tell me if the two people are the same. The easiest way to do this is to compare each pixel of the two images, and if the comparison between the two original images is less than a threshold, the two images can be said to be one person.

Of course, the performance of this algorithm is very poor, because the value of the pixel will change dramatically as the light changes, the direction changes, and even the image changes the position of the head. You’ll see that instead of using the original image you prefer to encode an F (image) to compare each pixel which will give you a more accurate answer to the question of whether the two photos are the same person.


3.7 Generative art through neurostyle transfer

In this chapter, we will attempt to achieve neural style transfer and use algorithms to generate novel art style images. In neurostyle transfer, the key point is that we need to optimize the cost function to get the pixel value. As shown below, we use an image’s style and migrate to an image that needs that style:

Neural style transfer mainly uses pre-trained convolutional neural networks and builds new layers at the top of it. This approach of using pre-training models and applying them to new tasks can be called transfer learning. The transfer learning of convolutional networks is very simple. In general, we can randomly initialize the weights of the last few classification layers and then train them on new data sets to quickly obtain excellent performance.

In the original NST paper, we will use the VGG-19 network, which will be pre-trained on ImageNet. Therefore, in the notes of Wan Zhen, we can run the following command to download the model and parameters;

model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
print(model)Copy the code

Then use the assign method to take the image as input to the model:

model["input"].assign(image)Copy the code

We can then get the activation value for a specific level using the following code;

sess.run(model["conv4_2"])Copy the code

In neurostyle transfer, the focus is on the following three steps:

  • Build the image loss function J_content(C, G) that requires the migration style

  • Constructing the style loss function J_style(S, G)

  • Combine them into the final loss function J(G) =α J_content(C, G) +β J_style(S, G)

Using this loss function, neural style transfer can be realized by transfer learning.