Front knowledge
There are three steps to machine learning:
- Find out the model
- Define a function in the function set
- Find the best function
Application scenarios
Regression can be applied to many scenarios in practice, such as the determination of steering wheel rotation Angle in driverless vehicle driving system and recommendation content in recommendation system.
Regression model determination
1. Find the linear model
A linear model is a statistical model that can be written as Y=XB+U
2. Eliminate functions that are obviously inconsistent with the actual situation to reduce the pressure of later calculation
This filtering can be done using the training set. For example, if y is positive, the output of sample point x after function (function) is negative, obviously this function is not our target function
3. Select the optimal function
Selecting the optimal function is the process of comparing the performance of functions with different w and B parameters and finally determining the function with the optimal performance. The performance of Regression function depends on the loss function (estimation error), which can be denoated as L.
L consists of two parts: error square and regularization. The sum of error square is used to measure the estimation error, and regularization is used to avoid the phenomenon of model overfitting. With the increase of λ value, regularization becomes gradually critical, so the training error will increase. Therefore, the testing error will first decrease and then increase.
The increasing size of the λ makes the choice model more smooth until it approaches the level, which is not what we want. Therefore, the optimal λ should satisfy the minimum error of the test set.
The reason why bias is not considered in regularzation is that regularzation is related to the smoothness of the model, w is related to the slope and has an impact on the smoothness of the model, and B is only related to the up-down translation of the model and does not affect the smoothness, so it is not considered.
The origin of regularzation? Regularzation was originally used to reduce the output independent noise effects of ovrfitting
## 4. So many combinations of W and B, how can L be used to find the optimal f more easily?
Using gradient descent:
Gradient Descent
1. The gradient
Gradient form: The partial derivative of a function written in matrix form is the gradient.
Gradient visualization:
The actual meaning of gradient: the original intention of gradient is a vector (vector), indicating that the directional derivative of a function at a certain point takes the maximum value in that direction, and the function changes fastest and at the highest rate along the direction of the gradient at this point. That’s the direction and the magnitude of the maximum derivative at one point.
2. Steps of gradient descent method:
- Let’s pick w0 at random
- Calculate the derivative (tangent slope) of L at w0
- So if the differential is positive or negative, the differential is positive, the slope is greater than 0, so if f goes up, w0 should go down further; The differential is negative, the slope is less than 0, f goes down, w0 should go up even more.
- 2. The determination of an increase or decrease:
- Repeat the process, updating the parameters several times, until the derivative is zero. Over.
A diagram using one parameter as an example
3. Application conditions of gradient descent method
Under the condition that the function is differentiable, as long as it is convex (convex function), Gradient Desent applies
All linear models are convex functions.
Unexpected situation:
Further optimization of the model?
Further optimization of the model can start from increasing the complexity of model times, increasing related attribute variables, increasing data volume, etc., but attention should be paid to the phenomenon of overfitting that may occur in this process.