This is the 9th day of my participation in the August More Text Challenge. For details, see:August is more challenging
This is the third in ng’s series of notes on machine learning, which focuses on how to diagnose biases and variances in machine learning applications, and how to implement corresponding solutions for machine learning systems, and how to design and evaluate machine learning systems.
Applied machine learning
In machine learning, how to reduce the prediction error of the model and improve the prediction accuracy of the model is a problem we always hope to solve. We tend to do the following:
- Get more training samples.
- Try to reduce the number of features
- Try to get more features
- Try adding polynomial features
- Try to reduce the degree of regularization λ\lambdaλ
- Try increasing the degree of regularization λ\lambdaλ
These methods are not chosen at random, but based on actual scenarios (such as bias and variance problems).
Partitioning of data sets
- Training set: 60%, used to train multiple models.
- Validation set (cross validation set) : 20%, used to calculate the cross validation error (that is, the cost function J(θ)J(\ Theta)J(θ)), and select the model with the lowest cost function value.
- Test set: 20%. The generalization error is calculated for the selected model and used to evaluate the generalize ability of the model.
Diagnostic bias and variance
In machine learning, bias refers to the difference between the predicted result and the true value. The bias reflects that the model cannot accurately express the data relationship. Variance refers to the discrete difference between the output results of multiple models. Variance reflects that the model is overly sensitive to the training set, and the data regularity is lost.
-
High deviation — underfitting
-
High variance — overfitting
Deviation and variance in polynomial fitting
We know that for the training set, the smaller the polynomial degree DDD is, the lower the model fitting degree and the larger the error is. The larger the DDD is, the higher the degree of model fitting is and the smaller the error is. For cross validation set, the smaller the polynomial degree DDD is, the lower the model fitting degree is and the larger the error is. With the increase of DDD, the error firstly decreases and then increases. At this time, the polynomial degree is high, the training error is very low, the model is easy to overfit, the data generalization ability of the cross validation set is poor, the cross validation set error is very high, and the error at this time belongs to the variance. However, when the polynomial degree is low, the error of both the training set and the cross validation set is very high, and the error in this case belongs to high deviation, because the model is prone to under-fitting.
The following figure shows the variation rule of error of training set and cross validation set with DDD of polynomial degree
Deviation and variance in normalization
In the study of linear regression and logistic regression, we know that regularization can be introduced to prevent model overfitting. The magnitude of the regularization penalty (that is, the value of λ\lambdaλ) also affects the model error similar to the polynomial fitting above.
The following figure shows the change rule of cost function error and λ\lambdaλ of the model of training set and cross validation set:
When λ\lambdaλ is small, the regularization penalty is small or almost no, at this time, the fitting degree of the model to the training set is still high, the error is still small, but the error applied to the cross validation set is large (because of overfitting).
With the increase of lambda \ lambda lambda, regularization penalties increasing, the model of the fitting degree of the training set of gradually reduce, tending to owe fitting, the training set error increases continuously, and for cross validation set, error will increase with the decrease of the first, because will gradually from a fitting model to achieve an optimal fitting degree, which is the turning point in the graph, After that, as the model tends to underfit, the error naturally increases.
Bias variance in neural networks
When the structure of the neural network is relatively simple, that is, the number of hidden layers is less, high deviation is easy to occur, but the calculation cost is small.
When the structure of the neural network is relatively complex, that is, there are many hidden layers, it is easy to have high variance. Although the calculation cost is relatively large, it can be solved by increasing λ\lambdaλ.
The learning curve
The Learning Curves are used to determine whether a Learning algorithm is biased or variance. The learning curve is a graph of the training set error and cross validation set error as a function of the number of training set samples MMM.
When the sample size MMM is small, the model is easy to fit on the training set, and the error of the training set is small, but the error of the cross validation set is large (no data rules can be found). With the increase of MMM, the difficulty of model fitting will increase, and the error of the training set will also start to increase, while the error of the cross validation set will decrease, because the number of samples is large, the learning ability of the model is enhanced, and data patterns can be found more easily.
The following figure reflects the influence of sample number MMM on the error of training set and cross validation set:
If the model has high deviation (underfitting), the error of cross validation set cannot be effectively reduced no matter how many samples are added.
If the model has high variance (overfitting) and the error of the cross validation set is much larger than that of the training set, it is helpful to reduce the error of the cross validation set if the number of samples is increased.
conclusion
methods | Applicable scenario |
---|---|
Get more training samples | To solveHigh variance |
Try to reduce the number of features | To solveHigh variance |
Try to get more features | To solveHigh deviation |
Try adding polynomial features | To solveHigh deviation |
Try to reduce the degree of regularization |
To solveHigh deviation |
Try to increase the degree of regularization |
To solveHigh variance |
Design of machine learning systems
Design method
- Start with a quick implementation of a simple algorithm and test it against a cross validation set of data.
- Draw the learning curve, and according to the image, what are the problems, high bias or high variance? This will determine whether to add more data, or add more features, or something else.
- Perform error analysis and manually check the samples in the cross validation set that generate prediction errors in our algorithm.
The evaluation index
The use of error alone cannot fully evaluate the quality of the model, because for the data set with unbalanced samples (such as too many positive samples and too few negative samples, that is, the data set is very skewed), it is not accurate to evaluate the quality of the algorithm by using the prediction accuracy. Therefore, the following two important evaluation indicators are introduced: Precision and Recall.
First define:
- Positive: indicates that the predicted sample is true.
- The prediction is True, and the reality is True
- False Positive: The prediction is true, but the reality is False
- Negative: indicates that the predicted sample is false.
- True Negative: The predicted value is false, but the actual value is false
- False Negative: Predicted to be False but actually true
The actual value \Predictive value | Predict Positive | Predict Negtive |
---|---|---|
The actual Positive | TP | FN |
The actual Negtive | FP | TN |
Precision
Recall
In order to better balance the two indicators, we use F1Score\mathbf{F_1Score}F1Score to synthesize the two indicators: