Refer to the website

zhuanlan.zhihu.com/p/26122044

1/ What is overfitting

Overfitting is a phenomenon in which the trained model performs well on the training set but poorly on the test set, and poor generalization ability. You get so good at the training data, you learn everything, but then you get to the test data set and you don't have the ability to generalize. Here's an example:Copy the code

We interpret the third model in the figure as over-fitting, which over-fitted the training data without taking generalization ability into account. The accuracy on the training set and the accuracy on the development set are plotted on a graph as follows:Copy the code

2/ Causes of overfitting

There are three main reasons for the occurrence of fitting: <1> Noisy data in the training data, <2> Insufficient training data (unable to cover all the actual data), <3> excessively complex training modelCopy the code

<1> Data is noisy

Why does noisy data lead to over-fitting of the model? All machine learning is a search hypothesis space process! We are searching for a set of parameters in the model parameter space to minimize our loss function, that is, we are constantly approaching our real hypothesis model, which can only be obtained when we know all the data distribution. Often our model is to find the optimal model that minimizes the loss function under the condition of limited training data, and then generalize the model to other parts of all data. This is the essence of machine learning! Well, suppose our aggregate data looks something like this:Copy the code

I assume that the total data distribution satisfies a linear model y = Kx +b. In reality, it will not be so simple and the data volume will not be so small, or at least billions of levels, but it does not affect the interpretation. Anyway, the total number satisfies model Y. At this point, some of the data we get are noisy, as shown in the figure below, and the red dots are noisy data points.Copy the code

Then the model trained by the above training data points is definitely not a linear model (** the standard model that meets the overall data distribution **). For example, the trained model is as follows:Copy the code

So I take this training data set of noisy data and I train it so that the loss function is zero. However, taking this model to the real population data distribution (satisfying the linear model) to generalize, the effect will be very poor, because you take a nonlinear model to predict the real distribution of the linear model, the obvious and easy to get effect is very poor, which also produces the over-fitting phenomenon!Copy the code

<2> Insufficient training data

When our training data is insufficient, even if the training data obtained is noise-free, the trained model may produce over-fitting phenomenon, which is explained as follows: Suppose that our overall data distribution is as follows:Copy the code

The training data we get is limited, such as the following one:Copy the code

I got A, B two training data, then by the training data, I get model is A linear model, through training more the number of times, I can get in the training data makes linear model of loss function is 0, I go to take this model generalization real population distribution data (actually meet the quadratic function model), obviously, The generalization ability is very poor, and there is an over-fitting phenomenon!Copy the code

<3> Excessive training model leads to very complex model

Overtraining models leads to very complex models, which can also lead to overfitting! Combined with the first two reasons, it is actually very easy to understand. When we train the training data, if excessive training leads to complete fitting of the training data, the obtained model may not be reliable. For example, in the training data with noise, if we over-train, the model will learn the characteristics of noise, which will undoubtedly lead to the decrease of accuracy in the real test set without noise!Copy the code

3/ How to solve overfitting

<1> Add more training data so that the model "sees" as many "exceptions" as possible. Add data that is representative of the actual situation and cover as much of the actual situation as possible. Then the model will constantly modify itself to get better training results. <2> <3> Reduce the complexity of the model When the amount of training data is small, the over-complexity of the model is the main factor causing over-fitting. Reducing model complexity can avoid excessive sampling noise in model fitting. For example, the number of layers and neurons is reduced in the neural network model. In the decision tree model, the tree depth is reduced and pruning is carried out. Add: The reason why random forest does not fit is that features are randomly selected and the depth of the decision tree can be controlled. In addition, random forest is bagging integrated learning idea and adopts voting mechanism. Over-fitting of a decision tree will not affect the whole resultCopy the code



4/ What is under-fitting

Underfitting refers to a situation in which the model performs poorly on both the training and test datasets.Copy the code

5/ Causes of underfitting

<1> Features are insufficient, and the correlation between features and tags is not strong. <2> The complexity of the model is insufficient, and the regularization coefficient is too largeCopy the code

6/ How to solve underfitting

<1> Add new features. When features are insufficient or the correlation between existing features and sample labels is not strong, the model is prone to underfitting. By mining new features such as "context features", "ID class features" and "combination features", better results can often be achieved. In the trend of Deep learning, there are many models that can help complete feature engineering, such as factorization machine, gradient lifting decision tree, deep-crossing and so on, which can become the methods of feature enrichment. <2> Increase model complexity. Simple models have poor learning ability. Increasing the complexity of the model can make the model have stronger fitting ability. For example, high-order terms are added to linear models, and the number of network layers or neurons is added to neural network models. <3> Decreases the regularization coefficient. Regularization is used to prevent over-fitting, but it is necessary to reduce regularization coefficient when the model is under-fitting.Copy the code