“This is the fourth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
1. Deep learning basics
For starters to deep learning, there are three entry-level problems:
- What is the relationship between artificial intelligence, machine learning, and deep learning?
- What is the general approach to machine learning?
- Why are so many people optimistic about deep learning and what are its future trends?
In the first lecture of this course, these questions will be addressed first. For the first question, I will start with the relationship among artificial intelligence, machine learning and deep learning. The scope of technology covered by the three is decreasing layer by layer. Artificial intelligence is the broadest concept, while machine learning is a way to realize artificial intelligence, which is also a more effective way at present. Deep learning is the hottest branch of machine learning algorithms, which has made remarkable progress in recent years and replaced most traditional machine learning algorithms. Therefore, the relationship among the three can be shown as follows: Artificial intelligence > Machine learning > Deep learning.
Second, for the second question, what is the general approach to machine learning?
The course takes “machine learning knowledge from Newton’s second law experiment” as a case, vividly explaining how machine learning (supervised learning) is a kind of technical method.
By analogy, a machine, like a mechanical student, can only learn knowledge (model parameter W) by trying to answer (minimize losses) a large number of exercises (known samples) correctly, expecting to use the knowledge w learned to form a complete model, can answer questions to which you do not know the answer (unknown samples). Loss minimization is the optimization goal of the model, and the method to achieve loss minimization is called optimization algorithm, also known as search algorithm (find the parametric solution that minimizes the loss function). parameterThe basic structure of the formula with the input X is called the hypothesis.
In middle school, when calculating the acceleration of gravity with the tilt-slip method, we proposed a linear hypothesis based on the observation of the weight and force data of the object, that is, the force and acceleration are linear. The verification process of Newton’s second law is also the parameter determination process of machine learning. Thus, the model assumes that evaluation function (loss/optimization objective) and optimization algorithm are three parts of a model.
Finally, a brief introduction to the history of deep learning was made in the form of explaining history lessons.
Before the advent of deep learning frameworks, machine learning engineers lived in the era of workshop production. In order to complete modeling, engineers need to have a large amount of mathematical knowledge and a large amount of industry knowledge for feature engineering work. Each model is highly personalized, and modelers, like craftsmen, form their own “personalized signatures” of the model. Now, “deep learning engineers” have entered the era of industrial production. With the requisite but minimal theoretical knowledge of deep learning, a mastery of Python programming makes it possible to implement extremely efficient models within a deep learning framework, even on par with the most advanced implementation models in the field. Modeling, which has long been dominated by “old scientists”, is facing disruption and an opportunity for new entrants.
2. Numpy implements neural network construction and gradient descent algorithm
Practice is the source of real knowledge, and theory is not as good as writing a few lines of code. Even though most users have built a neural network using some deep learning frameworks, they do not have a deep understanding of neural network and gradient descent algorithm. In response to the demands of students, this course added a practical course of using NUMpy to construct neural networks and realize gradient descent. This experiment realizes the regression model of Boston housing price forecast.
Deep learning models applied to different scenarios have certain universality, which are divided into five steps to complete model construction and training. Numpy is also used to realize neural network, and the steps are as follows:
- Data processing: Reads data from local files or network addresses and performs pre-processing operations, such as verifying the correctness of data.
- Model design: Complete the design of network structure (model element 1), which is equivalent to the hypothesis space of the model, that is, the set of relations that the model can express.
- Training configuration: Set the search algorithm (model element 2) adopted by the model, that is, the optimizer, and specify computing resources.
- Training process: The training process is called in cycles, and each round includes three steps: forward calculation, loss function (optimization objective, model element 3) and backward propagation.
- Save the model: Save the trained model for use in prediction.
The following uses Python to write a model predicting house prices in Boston that follows the same five steps. It is precisely because of the universality of this modeling and training process, that different models differ only on three elements of the model, while the rest of the five steps remain consistent, that a deep learning framework is useful.
Data processing and reading
First, data processing, data set division, data normalization, and the construction of data reading generator. The code is as follows:
def load_data(): Datafile =./work/housing. Data = np.fromfile(datafile, sep=) Item 14 is the corresponding median housing price feature_names = [CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, LSTAT, MEDV] Feature_num = len(Feature_names) # 0 Data = data.shape ([data.shape[0] // Feature_num, Feature_num]) # 0 0 0 0 Shape = int(data.shape[0] * ratio) training_data = data[:offset] # Calculate the maximum value, minimum value of train data set, Maximums, minimums, avgs = training_data. Max (axis=0), training_data.min(axis=0), Training_data.sum (axis=0)/training_data.shape[0] # #print(maximums[i], minimums[i], avgs[i]) data[:, i] = (data[:, I] -avgs [I])/(maximums[I] -minimums [I]) return training_data, test_dataCopy the code
Building a neural network
The output process of Boston housing price forecast is described in the way of “class and object”, and the implementation scheme is shown as follows. Class member variables have parameters W and b, and write a forward function (representing “forward calculation”) to complete the above calculation process from characteristics and parameters to the output predicted value.
class Network(object): def __init__(self, num_of_weights): To keep the results consistent each time the program is run, Seed (0) self.w = np.random. Randn (num_of_weights, 1) self.b = 0. Def forward(self, x): z = np.dot(x, self.w) + self.bCopy the code
At present, the forward process of the housing price prediction model has been realized, but how to know the prediction result, assuming that the predicted value isAnd real house prices forWe need some kind of indicator to measure the predicted valueWith the real valueThe gap between. For regression problems, the most commonly used measurement method is to use mean square error as an indicator to evaluate the quality of the model, which is defined as follows:
On the type of(Abbreviated as:) is also known as the loss function, which is an indicator to measure the quality of the model. Mean square error is a common form in regression problems.
Since the weight of the realized housing price prediction model is randomly initialized, the probability that the weight parameter is at the minimum value of the model is almost 0. We need to use the gradient descent algorithm to constantly update the weight until the weight is near the minimum or minimum value of the model.
Numpy implements the gradient descent algorithm
When implemented using a deep learning framework, this part does not need to be implemented manually. But that doesn’t mean we don’t need to know about it. This course takes the blind man downhill as an example to explain the basic principles of gradient descent and the use of NUMpy to achieve gradient descent.
As mentioned above, the first step to build a machine learning model is to build an algorithm from a hypothesis space to achieve the optimal value of this hypothesis space. Here’s an example.
The process of reaching the bottom of the slope (the optimal value) from the randomly initialized point is particularly similar to that of a blind person who wants to walk from the peak to the valley. He cannot see where the valley is (he cannot reverse solve the parameter value when the derivative of Loss is 0), but he can extend his foot to explore the slope around him (the derivative value of the current point, also known as the gradient). Therefore, solving the minimum value of Loss function can be achieved by “descending step by step from the current parameter value to the lowest point”.
Now we want to find a set of valuesTo minimize the loss function, the scheme of gradient descent method is as follows:
- Pick a random set of initial values, for example:
- Pick the next pointmake
- Repeat step 2 above until the loss function hardly drops anymore
We have talked about the calculation method of the loss function above, and the formula defines the loss function as follows:
Among themNetwork pair number oneThe predicted value of the sample
According to the formula, we can calculaterightandThe partial derivative
It can be seen from the calculation process of the derivative that the factorIt cancels out, and that’s because the derivative of the quadratic produces a factor of two, and that’s why we’re rewriting the loss function.
What we’re interested in here isand.
The following gradient calculation function can be defined in the Network class.
Gradient calculation formula
With the help of the matrix manipulation in NUMpy, we can directly calculate the gradient corresponding to the 13 parameters for all one-time values
It doesn’t matter if you can’t understand the formula, this course is mainly carried out by combining theory with practice. See how the following code implements gradient calculation, network training, and parameter updating.
def gradient(self, x, y): z = self.forward(x) gradient_w = (z-y)*x gradient_w = np.mean(gradient_w, axis=0) gradient_w = gradient_w[:, np.newaxis] gradient_b = (z - y) gradient_b = np.mean(gradient_b) return gradient_w, gradient_b def update(self, Graident_w5 gradient_w9, eta = 0.01) : net.w[5] = net.w[5] - eta * gradient_w5 net.w[9] = net.w[9] - eta * gradient_w9 def train(self, x, y, iterations=100, Eta =0.01): Points = [] Losses = [] for I in range(iterations): points.append([net.w[5][0], net.w[9][0]]) z = self.forward(x) L = self.loss(z, y) gradient_w, gradient_b = self.gradient(x, y) gradient_w5 = gradient_w[5][0] gradient_w9 = gradient_w[9][0] self.update(gradient_w5, gradient_w9, eta) losses.append(L) if i % 50 == 0: print( iter {}, point {}, loss {} .format(i, [net.w[5][0], net.w[9][0]], L)) return points, lossesCopy the code
After running the code, you can clearly see the decline of the loss function in the figure below.
It starts with an overview of machine learning deep learning, explains the basic knowledge of deep learning, realizes housing price prediction model by using Numpy, explains in detail the five steps of building deep learning model, and the basic principle of gradient descent, how to use Numpy to realize gradient descent and other contents.
3. Principle and practice of main directions in computer vision field
4. Principle and practice of main directions in natural language processing
5. The principle and practice of personalized recommendation algorithm