Generally, there is an objective function in each machine learning algorithm, and the algorithm is solved through the optimization process of this objective function. In classification or regression problems, the loss function (cost function) is usually used as its objective function. The loss function is used to evaluate the difference between the predicted value and the real value of the model. The better the loss function is, the better the performance of the model is generally. Different algorithms use different loss functions. The loss function is divided into empirical risk loss function and structural risk loss function. The empirical risk loss function refers to the difference between the predicted result and the actual result, and the structural risk loss function refers to the empirical risk loss function plus the regular term. Usually expressed as follows:
There are five common loss errors:
- Hinge Loss: mainly used in support vector machine (SVM);
- Cross Entropy Loss (Softmax Loss) : used in Logistic regression and Softmax classification;
- Square Loss: mainly in least Square method (OLS);
- Exponential Loss: It is mainly used in Adaboost integrated learning algorithm.
- Other losses (e.g. 0-1 loss, absolute loss)
1.1-1 Loss function and absolute value loss function
2. Log log Loss function and Softmax Loss
The loss function of logistic regression is logarithmic loss function. In the derivation of logistic regression, it assumes that samples obey Bernoulli distribution (0-1), and then obtains the likelihood function satisfying the distribution, and then uses logarithm to find the extremum. Logistic regression does not seek the maximum of logarithmic likelihood function, but takes maximization as an idea, and then deduces its risk function as the minimum negative likelihood function. From the point of view of the loss function, it’s called the log loss function. Standard form of log loss function:
In maximum likelihood estimation, the logarithm is usually taken first, then the derivative is taken, and then the extreme point is found, so as to facilitate the calculation of maximum likelihood estimation. Loss function L (Y, P (Y | X)) refers to the sample in the case of classification Y X, make the probability P (Y | X) maximum (using the known sample distribution, find that causes the biggest probability distribution parameter values)
3. Squared loss function
The least square method is a linear regression method which transforms the regression problem into a convex optimization problem. The basic principle of least square method is that the optimal fitting curve should minimize the sum of all points from the regression line. Distance is usually measured by Euclidean distance. The loss function of squared loss is: