Error sources: Bias and variance

The author | Zhang Yaoqi edit | township

Zhang Yaoqi is currently an iOS engineer in Tencent Jitong Application Department. Trained in mathematics, CSDN blog expert (column of YoferZhang); I am interested in machine learning.

Discuss two sources of error: bias and variance. Estimate bias and variance. Contrast to illustrate the effect of bias and variance on true error. Different error causes can be handled in different ways.

Reference courses: speech.ee.ntu.edu.tw/~tlkagk/cou…

Look at this first, probably because you are looking at the platform interline formula doesn’t support a lot of rendering, so it’s best to look at it on my CSDN, portal :(helpless face)

CSDN blog address: blog.csdn.net/zyq52237682…

review

Example of Pokemon in Post 2:

You can see that the more complex the model retest set, the better the performance

What I want to talk about in this article is where does error come from?

There are two main sources of error, bias and variance.

To estimate

Let’s assume, above, that the real equation for Pokemon CP values is known only to Niantic (the company behind Pokemon Go). $f^{*}$$= hat{f}$= hat{f}$

Estimate bias and variance

Estimate the average value of variable $x$

Suppose $x$has an average value of $\mu$and a variance of $\sigma^{2}$.
${x^{1}, x^{2}, \ldots, x^{N}}$
Calculating average $m $, $m = \ frac {1} {N} \ sum_ x ^ {N} {N} \ neq \ mu $

But if you take m for a bunch of groups, and you find the expectation of M

This estimate is unbiased.

Then the dispersion degree (variance) of m distribution for $\mu$:

This depends mainly on N, as you can see in the figure below, the smaller N is, the more discrete it is

Estimate the variance of variable $x$

First, we estimate m using the method we just did,

Then do the following calculation:

$sigma^{2}$= $s^{2}$

This estimate is biased,

$s^{2}$

Use the bull ‘s-eye to illustrate the effect of bias and variance

$\hat{f}$= $\hat{f}$= $ $$\bar{f} = E[f^{*}]$

$\bar{f}$and $hat{f}$$$$$\bar{f}$$$$$$\hat{f}$$$$$$\bar{f}$$$$$$\hat{f}$$ The influence of two errors was observed by comparing the four graphs.

Bias is the error of aiming when shooting. It is supposed to aim at the center of the target, but bias causes the error of aiming. Variance is directed at $\bar{f}$, but is not targeted and always shot around $\bar{f}$.

Why is there a lot of $f^{*}$?

Discuss the case in Series 02: here we assume that different Pokemon are caught in a parallel universe

$f^{*}$is found in different training sets using the same model

It’s like shooting a bull ‘s-eye and doing it in many sets (multiple sets). Now we need to know what the dispersion is, let’s draw 100 models in the universe, right

Different data sets before anything could happen – | |

Consider variance between different models

The variance of a model is relatively small, that is, it is relatively concentrated and discrete. However, the variance of model 5 is relatively large, and similarly, the dispersion is wide and the dispersion degree is large.

So, with a simpler model, variance is relatively small (like shooting each time, each shot is set in a small area). If complex models are used, variance is large and the dispersion is relatively wide.

This is also because simple models are less affected by different training sets.

Consider the bias of different models

There is no way to know the real $\hat{f}$, so assume that the black curve in the figure is the real $\hat{f}$

Results visualization, one average $\bar{f}$is not as good as five, although the overall results of five are highly discrete.

The bias of one model is relatively large, while the bias of five complicated models is relatively small.

Intuitive explanation: The space of the simple model function set is relatively small, so maybe the space does not contain the bull’s eye, and it will definitely miss the target. However, the complex model function set has a large space, which may contain the bull ‘s-eye. However, there is no way to find the exact bull ‘s-eye, but enough of them may get the real $\bar{f}$.

bias v.s. variance

Split the errors in series 02 into bias and variance. The simple model (left) is error caused by large bias, which is called Underfitting, while the complex model (right) is error caused by large variance, which is called Overfitting.

How do you know?

Analysis of the

If model does not have a good fit training set, bias is too large, which is also Underfitting
If Model fits well into the training set, that is, it gets a small error on the retraining set, but a large error on the test set, it means that Model may have a large variance, which is Overfitting.

Underfitting and Overfitting are handled in different ways

Bias, big, Underfitting

The model should be redesigned at this point. Because the previous set of functions might not have included $\hat{f}$at all. You can:

Add more features, such as height, weight, HP, etc. Or consider more power, more complex models. If you force more data to train at this time, it will not help, because the designed function set itself is not good, and it will not be better to find more training sets.

Large variance and Overfitting

Simple and crude method: More data

However, it is not always possible to collect more data. Regularization can be made to the data set for the understanding of the problem. For example, when recognizing handwritten numbers, the data set of deflection Angle is not enough, then the normal data set is 15 degrees left, 15 degrees right, and similar processing.

Choose the model

There is now a tradeoff between bias and variance
The model you want to choose can balance errors generated by bias and variance to minimize the total error
But here’s what not to do:

Different models are trained with the training set, and then error is compared on the test set. The error of Model3 is relatively small, so model3 is considered to be better. But it’s really just the test set you have, not the complete test set. For example, error is 0.5 on existing test sets, but error is usually greater than 0.5 after more test sets are collected conditionally.

Cross Validation

In the figure, the public test set is existing, while the private test set is not known. Cross Validation subdivides the training set into two parts, one for the training set and one for the Validation set. Train model with the training set, and then verify the comparison on the set. After getting the best Model (such as Model3), train Model3 with all the training sets, and then test with the public test set. In this case, the error obtained is generally larger. At this point, it’s tempting to go back and tweak the model to make it better on the public test set, but it’s not recommended. (In the heart uncomfortable ah, college mathematical modeling back and forth, painful toss)

The above method may worry about the poor performance of splitting the training set. The following method can be used.

N-fold Cross Validation (N-fold Cross Validation)

Divide the training set into N parts, such as three.

For example, if Average Error is the best among the three training results, then train Model1 with all training sets. (It seems that mathematical model also dry, then are puzzling points, think of the mathematical model of the time are simply too late to see why, is a rush to do 00oo00)