Detailed summary of variance, standard deviation, mean square error, root mean square error
See online other god summary is copy and paste, typesetting is very messy, we hereby summarize and elegant typesetting.
The variance
Variance is a measure of how discrete a random variable or set of data is. Variance is a measure of how much a random variable deviates from its mathematical expectation (the mean). Variance in statistics (sample variance) is the average sum of the squares of the differences between the data in each sample and the average. In many practical problems, it is of great significance to study variance or deviation degree.
For a group of random variables or statistical data, the expected value (mean) is represented by E(X)E(X)E(X) E(X), that is, the mean of random variables or statistical data, and the formula of variance is shown as follows (the sum of the squares of the differences between each data and the mean is calculated as the mean) :
σ2\sigma^{2}σ2 is the population variance, XXX is the variable, E(X)E(X)E(X) E(X)E(X)E(X) is the expected value (population mean), and N is the population sample.
Note: This formula describes how far random variables (statistics) deviate from the mean. The concept of variance can also be understood from the perspective of least square method.
The standard deviation
Standard deviation, also known as mean square deviation, is the arithmetic square root of variance. The formula for standard deviation is as follows:
μ\muμ denotes expectation, equivalent to E(X)E(X)E(X) above.
Why do we need a standard deviation when we have variance?
And you can see that standard deviation is based on variance, just taking a square root. So why did we invent standard deviation?
To put it simply, the unit of variance is inconsistent with the unit of data, so it cannot be used. Although it can well describe the deviation degree between data and mean, the processing result is not in line with our intuitive thinking.
The standard deviation is consistent with the unit of data, so it is convenient to use. The inherent reason is that the variance is squared, and the standard deviation, by adding a square root, is consistent with the dimension of the mean, making it more convenient to describe a range of fluctuations than the variance.
Note: For why square is used, see least square method.
Here’s an example: There are 60 students in one class, and the average score is 70 points, the standard deviation is 9, the variance is 81, assuming that achievement is normal distribution, we determined by class variance can’t intuitive and how much deviation from the mean value, student achievement is obtained by standard deviation we are intuitive distribution around, [61] a probability of 68%, That is approximately 34.2%*2 as shown in the figure below
Bonus notes: One standard deviation is about 68% (mean – standard deviation, mean + standard deviation), two standard deviations are about 95% (mean -2 x standard deviation, mean +2 x standard deviation), and three standard deviations are about 99%. It reflects the degree of dispersion among individuals in a group.
The reason why we use standard deviation in a normal distribution is because it’s a very good idea of the range of fluctuations.
Mean square Error (MSE)
The mean square error (MSE) is the average sum of the squares of the differences between data and the true value, that is, the average sum of the squares of errors.
θ^\hat{\theta}θ^ is an estimator, and θ\thetaθ is the actual value.
For example: We want to measure the temperature in the room. Unfortunately, our thermometer is not very accurate, so we need to measure 5 times and get a set of data [X1, X2,x3, X4, X5]. Suppose the true temperature is XXX, and the error between the data and the true value is E = X − Xie = X-x_IE = X − XI.
So MSE=∑(x−xi)MSE=\sum(x-x_i)MSE=∑(x−xi).
Root mean square error
The square root of the mean square error is called the root mean square error, and the root mean square error is formally close to the standard deviation.
In general, the mean square error (standard deviation) is the relationship between the data series and the mean value, while the root mean square error is the relationship between the data series and the true value. Therefore, the standard deviation is used to measure the dispersion degree of a set of numbers, while the root mean square error is used to measure the deviation between the observed value and the true value. Their research objects and research purposes are different, but the calculation process is similar.
conclusion
-
Mean square deviation = standard deviation
-
Variance is the average sum of squares of deviations from the average value of data
-
The mean square error (MSE) is the average sum of squares of the differences between data and the true value
-
The variance is the mean, and the mean square error is the true value
In general, variance is the relationship between data series and mean value, while mean square error is the relationship between data series and true value, so we should pay attention to distinguish true value from mean to difference.
Refer to the article: zhuanlan.zhihu.com/p/83410946,…