This article is a summary of Machine learning by Andrew Ng
Linear regression
Cost function Fuction
Its goal is to select model parameters that minimize the sum of squares of modeling errors
Batch Gradient Descent
It is the learning rate, which determines how much step we take downward in the direction of maximum reduction of the cost function. In batch gradient descent, we subtract the learning rate from all parameters at the same time each time times the derivative of the cost function.
forThe value of can be tried:
Using the gradient descent method, the key lies in finding the derivative of the cost function, namely:
When:
When:
Standard Equation Method for Scaling
Ensuring that all features have similar scales can help the gradient descent algorithm to converge faster
Among themIs the average,Is the standard deviation
The Normal Equation
For linear models, when the number of characteristic variables is not large, the standard equation is a good alternative method to calculate parameters. Specifically, AS long as the number of characteristic variables is less than 10,000, I usually use the standard equation method rather than the gradient descent method.
Logistic regression
Logistic regression algorithm is a classification algorithm, and we use it as a classification algorithm. It is applicable to discrete label values, for example, 1 0 0 1.
Hypothesis Function
Cost function Fuction
Gradient descent
As with linear regression, it can be usedBatch gradient descent algorithmTo find the lowest cost function:
Repeat {
}
Compared with linear regression, the prediction function of logistic regressionIt’s totally different.
Multi-category classification: one to many
Logistic regression can be used to achieve one-to-many classification. The specific idea is as follows:
We mark one of the multiple classes as a forward class, and then mark all the other classes as negative classes. This model is called. Next, similarly, we select another class to mark as a forward class, and mark all the other classes as negative classes, and write this model asAnd so on. In the end we have a series of models which are abbreviated as:Among them:
Finally, when we need to make predictions, we run all the classifiers, and for each input variable, we choose the most likely output variable.
Regularization Regularization
Regularization is a method to solve the problem of OverFiting. It preserves all the features, but reduces the size of the parameters.
Another approach: discard some features that don’t help us predict correctly. You can manually select which features to keep, or use some model selection algorithm to help (such as PCA)
Regularized linear regression
We need to introduce regularization terms in gradient descent
Simplify this formula:
It can be seen that the change of the gradient descent algorithm of regularized linear regression lies in that each time, it is made on the basis of the updating rules of the original algorithmValue is reduced by an extra value
Regularized normal equation:
Regularized logistic regression
The regularization formula of logistic regression is similar to that of linear regression, but the function is assumed to be.