From https://blog.csdn.net/qq_24753293/article/details/78788844, thank you bosses share L L (Y, f (x)) on behalf of the loss function is real value Y f (x) is the model predicted valuesCopy the code

preface

When it comes to machine learning, what beginners hear most is' loss function '. I am also confused about this word, as if one definition today is another definition tomorrow. After reading a lot of articles and blogs, I finally got a little better. First we need to understand what a loss function is: a descriptive way of measuring the predictive effectiveness of machine learning models. Now, if this is a little abstract, what a loss function does is it tells you how much the prediction is different from the actual data. So if you do a linear regression, and there's an error between the predicted value and the actual value, let's find a function that says that error is the loss function. The difference between the predicted value of a machine learning model and the true value of a single sample is called a loss. The smaller the loss, the better the model. If the predicted value is equal to the true value, there is no loss. The function used to calculate the loss is called the loss function. Each prediction of the model is measured by the loss function. The loss function is a dynamic value (limit is 0), and we optimize the model through continuous iteration to make the loss function smaller and smaller. There are many ways to calculate losses: absolute loss function squared loss function Exponential loss function Logarithmic loss function The loss function of a classification algorithm is generally: 0-1 Loss function Exponential loss function logarithmic loss function The general loss function of a regression algorithm is: mean square loss function Absolute loss function squared loss functionCopy the code

demo

Here's a quick example: If you're a judge in a chef's competition, you decide who gets the final score. In your opinion, color, smell and taste are all 100 points. Every cook has his own scheme and cooking skills, and reach the effect is different (this is equivalent to forecast), with a set of rules you judge how much they finally points (you are loss function) suppose that we make the real value of Y predicted values for f (x) loss function for L (Y, f (x)) their relationship is below:Copy the code

Loss function is used to estimate the degree of inconsistency between the predicted value F (x) and the real value Y of your model. It is a non-negative real value function and is usually represented by L(Y, f(x)). The smaller the loss function is, the better the model's robustness will be. Demo: First of all, let's assume that we want to predict the sales volume of a certain product of a company:Copy the code

X: Number of stores Y: Sales We'll see sales go up with the number of stores. So we want to know about the relationship between stores and sales. Let's describe a straight line based on the points on the graph:Copy the code

It seems that this line almost explains the relationship between the number of stores X and Y: let's assume that the equation of this line is Y= A0 +a1X (a is a constant coefficient, a single equation). Suppose the regression curve we fit for the first time is Y=3X+10, and the error between the predicted value and the true value is shown as follows:Copy the code

We can think of a table of absolute loss functions to show how well we fit Y equals 3x plus 10. Based on the above data, we calculate that the absolute loss function is 6. However, we usually substitute the absolute loss function with the square loss function: in the above case, we calculate that the square loss function is 10. Suppose we fit another line Y= 4x+8Copy the code

Absolute loss function sum: 11 square loss function sum: 27 Based on the two fitting equations, from the loss function sum, it can be assessed that the first fitting can better predict store sales.Copy the code

conclusion

Commonly used loss functions are as follows: (1)0-1 lossfunction (0-1 lossfunction): in the second-class classification task, the predicted value is different from the real value, that is, the prediction is wrong, then the loss is 1; If the predicted value is equal to the true value, the forecast is correct, and the loss is zero, which means there is no loss.Copy the code

(2) The square of the difference between the predicted value and the true value of quadraticloss function. The greater the prediction error, the greater the loss. Easy to understand.Copy the code

(3) absolute value of the difference between the predicted value and the true value of the absoluteloss function. Absolute values are not easy to calculate and are not commonly used.Copy the code

(4) Logarithmicloss function or log-likelihood loss function is taken when the predicted value is probability. Since the probability range [0, 1] the log value is (-∞, 0), the negative logarithm is taken to make the loss greater than 0. There's a minus sign in the formula above. Look at the image, at a glance haha:Copy the code

(5) Exponential loss function (1) Very sensitive to outliers and noise. It's often used in AdaBoost algorithms.Copy the code