Preface:

The model of machine learning is to feed a large amount of data to a model, and the model will constantly adjust its parameters according to the data, and eventually have the ability to identify the patterns or characteristics of the data. If the model cannot train a good effect from the data, it is considered to be under-fitted. If the model achieves good results during training, but the results are not good when tested on the data not involved in training, it is considered to be overfitting.

In this paper, the concept, characteristics, causes and solutions of underfitting and overfitting are introduced. After reading this article carefully, readers will have a comprehensive understanding of underfitting and overfitting.

Pay attention to the public number CV technical guide, timely access to more computer vision technology summary articles.

The concepts of underfitting and overfitting

In the process of training a model, we usually want to achieve the following two goals:

1. The training loss value should be as small as possible.

2. The gap between the training loss and the test loss should be as small as possible.

When the first purpose is not achieved, it means that the model has not been trained well, and the ability of the model to distinguish the patterns or features of the data is not strong, it is considered to be under-fitted.

Reached when the first purpose, does not meet the second, that model training out of the very good effect, and the loss of the test value is large, then model on new data performance is very poor, this model can be thought of excessive fitting the training data, and to participate in the training data do not have very good discriminant or fitting ability, in this case, the model is fitting.

Here’s a popular example:

Suppose your family arranges for you to go on a blind date and tells you that the woman is waiting for you at xyz restaurant.

If your family tells you that women wear skirts and long hair over their shoulders. As a result, you go in and find several places with girls wearing skirts and long hair. At this point, you can’t tell which one it is, which means you don’t know enough about the characteristics of the girls, and it’s not a good fit.

If your family tells you, the woman wearing A skirt, wearing A hat, shawls long hair, mobile phone shell is Doraemon, eyebrow corner has A mole. Results the woman think restaurant is too hot, did not last long into the dining room, took off his hat, you after arrival in a girl meet other conditions, is don’t wear a hat, you blind date so that she is not you, it shows that you understand the characteristics of the girl too much, for a slightly different characteristics, you will make wrong judgment, it is a fitting.

If the family tells you that the woman wears a skirt, shawl long hair, eyebrow corner has a mole. You go in and find that although several people are wearing skirts and many have long hair, the only one with a mole on the corner of her eyebrow, even if she is wearing a hat, does not prevent you from thinking that she is your blind date, so you accurately walk up to her and start an awkward exchange, which is a reasonable fit.

In this case, skirt and long hair are common features of girls. The combination of the mole on the eyebrow and skirt and long hair is her unique feature. The mobile phone case and wearing a hat belong to accidental features. Overfitting refers to taking accidental features as identification marks, while underfitting refers to not knowing enough features and not having enough learning ability to represent models in machine learning to learn enough data features.

Characteristics of underfitting: the loss value of training is very large, and the loss value of testing is also very large.

Characteristics of overfitting: the loss value of training is small enough, while the loss value of testing is large.

For a model or neural network with sufficient complexity or sufficient number of parameters, it will undergo a process of “underfitting – moderate fitting – overfitting” as the training progresses.

For a model with insufficient complexity or a neural network with too few parameters, there is only underfitting.

Causes and solutions of underfitting

According to the characteristics of underfitting, there are two main reasons for underfitting:

1. The capacity or complexity of the model is not enough. For the neural network, the number of parameters is not enough or the network is too simple, and there is no good feature extraction capability. In order to avoid model overfitting, regularization is usually added. If the regularization penalty is too high, the feature extraction capability of the model will be insufficient.

2. Too little training data or too few training iterations lead to the model not learning enough features.

According to the analysis of the causes of underfitting, there are two solutions:

1. Change a more complex model. For the neural network, change a network with strong feature extraction ability or larger number of parameters. Or reduce the penalty for regularization.

2. Increase the number of iterations or try to get enough training data or learn enough features from a small amount of data. Such as moderately increasing epoch, data enhancement, pre-training, transfer learning, small sample learning, unsupervised learning, etc.

Causes and solutions of overfitting

According to the characteristics of overfitting, there are four reasons for overfitting:

1. The model is too complex and has too many parameters or strong feature extraction capability for the neural network, so the model learns some occasional features.

2. The data distribution is too single. For example, all the birds used for training are in cages, and the model can easily take the cage as a feature to identify birds.

3. Too much data noise or too much interference information, such as face detection, the resolution of the training image is hundreds by hundreds, while the face only takes up dozens to hundreds of pixels, at this time the background is too large, the background information belongs to interference information or noise.

4. There are too many training iterations, and repeated training of data will also make the model learn occasional features.

According to the causes of overfitting, there are four solutions as follows:

1. Change to a less complex model or regularization. For neural networks, use a network with a smaller number of parameters or use regularization.

2. Use data with different distributions for training. Such as data enhancement, pre-training, etc.

3. Pre-processing the image by image clipping.

4. Stop training in time. How to tell when to stop training? If the k-fold cross-validation is used, if the training loss is still decreasing while the validation loss starts to increase, it indicates that overfitting begins to occur.

Other articles

Summary of optimization function technology

Summary of attention Mechanism

Summary of feature pyramid

Summary of data enhancement methods

CNN visualization technology summary

Summary of CNN structure evolution — classic model

Summary of CNN structure evolution — lightweight model

Summary of CNN’s structural evolution — Design principles

Summary of pooling technology

Summary of non-maximum inhibition

Summary of English literature reading methods

Summary of common ideas of paper innovation

This article comes from the technical summary series of the public CV technical guide.