In this paper, the author divides the commonly used loss function into two categories: classification and regression. The regression includes a less common loss function: the mean deviation error, which can be used to determine whether a model has a positive bias or a negative bias.


Machines learn through loss functions. This is a way of evaluating how well a particular algorithm models a given data. If the predicted value deviates far from the actual result, the loss function gets a very large value. With the help of some optimization functions, the loss function gradually learns to reduce the error of the predicted value. This paper introduces several loss functions and their applications in machine learning and deep learning.


There is no loss function that fits all machine learning algorithms. Choosing a loss function for a particular problem involves many factors, such as the type of machine learning algorithm chosen, the ease with which derivatives are computed, and the proportion of outliers in the data set.


Based on the types of learning tasks, loss functions can be broadly divided into two categories: regression loss and classification loss. In the classification task, we need to predict the output from the data set with limited category value, for example, given a large data set of handwritten digital image, divide it into one of 0 ~ 9. Regression problems deal with the prediction of continuous values, such as the price of a house given the size of the house, the number of rooms, and the size of the room.

NOTE 
        n        - Number of training examples.
        i        - ith training example in a data set.
        y(i)     - Ground truth label for ith training example.
        y_hat(i) - Prediction for ith training example.
Copy the code


Return loss

Mean square error/square loss /L2 loss

Mathematical formula:

As the name suggests, the mean square error (MSE) measures the mean square of the difference between the predicted value and the actual observed value. It only considers the average size of the error, not its direction. But by squaring, predictions that deviate more from the true value are punished more severely than those that deviate less. This, coupled with the mathematical properties of MSE, makes it easier to calculate gradients.

Import numpy as NP y_hat = Np. array([0.000, 0.166, 0.333]) y_true = Np. array([0.000, 0.254, 0.998]) def rmse (predictions, the targets) : differences = predictions - targets differences_squared = differences ** 2 mean_of_differences_squared = differences_squared.mean() rmse_val = np.sqrt(mean_of_differences_squared)return rmse_val
print("d is: " + str(["%.8f" % elem for elem in y_hat]))
print("p is: " + str(["%.8f" % elem for elem in y_true]))
rmse_val = rmse(y_hat, y_true)
print("rms error is: " + str(rmse_val))
Copy the code


Mean absolute error /L1 loss

Mathematical formula:


The mean absolute error (MAE) measures the average sum of the absolute differences between the predicted and observed values. Like MSE, this measure measures the size of the error regardless of direction. But unlike MSE, MAE requires more sophisticated tools like linear programming to calculate gradients. In addition, MAE is more robust to outliers because it does not square.

Array ([0.000, 0.166, 0.333]) y_true = Np. array([0.000, 0.254, 0.998]) import numpy as NP Y_hat = NP. array([0.000, 0.166, 0.333])print("d is: " + str(["%.8f" % elem for elem in y_hat]))
print("p is: " + str(["%.8f" % elem for elem in y_true]))

def mae(predictions, targets):
    differences = predictions - targets
    absolute_differences = np.absolute(differences)
    mean_absolute_differences = absolute_differences.mean()
    return mean_absolute_differences
mae_val = mae(y_hat, y_true)
print ("mae error is: " + str(mae_val))
Copy the code


Mean bias error is less common in machine learning than other loss functions. It is similar to MAE, except that it does not use absolute values. One thing to notice about this function is that the positive and negative errors cancel each other out. Although not as accurate in practice, it can determine whether a model has a positive or negative bias. Mathematical formula:


Classification of loss

Hinge Loss/ multi-classification SVM Loss


In short, within a certain safe interval (usually 1), the score for the correct category should be higher than the sum of the scores for all the wrong categories. Therefore, Hinge Loss is often used for maximum-margin classification, and support vector machines are most commonly used. Although not differentiable, it is a convex function, which makes it easy to use the convex optimizers commonly used in machine learning.


Mathematical formula:


Consider, for example, that we have three training samples to predict three categories (dog, cat, and horse). Here are the values for each category that our algorithm predicts:


Hinge loss of these 3 training samples was calculated:

## 1st training exampleMax (0, (1.49) - (0.39) + 1) + Max (0, (4.21) - (0.39) + 1) Max (0, 2.88) + Max (0, 5.6) 2.88 + 5.6 8.48 (High Loss as Very wrong prediction)## 2nd training exampleMax (0, (4.61) - (3.28) + 1) + Max (0, (1.46) - (3.28) + 1) Max (0, 6.89) + Max (0, -0.82) 0 + 0 0 (Zero Loss as correct Prediction)## 3rd training exampleMax (0, (1.03) - (2.27) + 1) + Max (0, (2.37) - (2.27) + 1) Max (0, 4.3) + Max (0, 0.9) 4.3 + 0.9 5.2 (High Loss as Very Wrong prediction)Copy the code


Cross entropy loss/negative log likelihood:

This is the most common setting for classification problems. As the predicted probability deviates from the actual label, the cross entropy loss increases gradually.


Mathematical formula:


Import numpy as NP Predictions = Np.array ([[0.25,0.25,0.25,0.25], [0.01,0.01,0.01,0.96]]) targets = np.array([[0,0,0,1], [0,0,0,1]]) def cross_entropy(predictions, targets, epsilon=1e-10): predictions = np.clip(predictions, epsilon, 1. - epsilon) N = predictions.shape[0] ce_loss = -np.sum(np.sum(targets * np.log(predictions + 1e-5)))/Nreturn ce_loss
cross_entropy_loss = cross_entropy(predictions, targets)
print ("Cross entropy loss is: " + str(cross_entropy_loss))
Copy the code


Related reading: An introduction to Machine learning overview