“Click to pay attention to little White CV, learn from this step faster”

Loss function is a soul module of deep learning model optimization. Therefore, no matter the new Transform model or the earlier AlexNet, as long as the optimization effect needs to be evaluated, the design and application of loss function are inevitably involved. Therefore, a variety of Loss functions are naturally involved in the interview field.

In this paper, we mainly focus on the target detection task, which mainly includes two reasons:

  1. Target detection includes two tasks, namely classification information and location information. The loss function naturally includesClassificition LossandBounding Box Regeression LossTwo parts.
  2. Other deep learning tasks, such as target detection and semantic segmentation, have a lot of overlap, and target detection is relatively simple.

Based on the above two reasons, we will focus on BoundingBox Loss in target detection regression Loss, and then focus on Loss functions commonly used in tasks such as segmentation when there is time.

In this paper, the evolution of Bounding Box Regression Loss Function in target detection tasks in recent years is introduced. The evolutionary route is Smooth L1 Loss -> IoU Loss -> GIoU Loss -> DIoU Loss -> CIoU Loss, and this paper explains according to this route.

It takes about 2 minutes to read the full text. It takes about 2 minutes to read the full text. Due to limited time, relax in batches.

1. Smooth L1 Loss

In order to limit the gradient in two ways:

  1. When the difference between prediction box and Ground truth is too large, the gradient value is not too large;
  2. When the difference between prediction box and Ground truth is very small, the gradient value is small enough.

Why do you want to do both?

  1. When the gap is large, the gradient is too large, which may lead to gradient explosion.
  2. When the gap is small, the gradient is small enough to get close to the best advantage and avoid large cross-jumps.

Is Smooth L1 Loss such an excellent and demanding condition? Moving on, let’s compare Smooth L1 Loss with its associated regularization Loss, L1 Loss and L2 Loss.

The following loss functions are investigated, where x is the difference between prediction box PD and Groud truth. There will be:

The derivatives of the loss function with respect to x are:

Observe (4), the derivative of L2 with respect to x is proportional to x. So as x goes up, the derivative with respect to x goes up linearly. As a result, at the early training stage, when the difference between the predicted value and groud truth is too large, the gradient of the loss function to the predicted value is very large, and the training stage is unstable.

According to equation (5), the derivative of L1 with respect to x is constant, always 1 or -1. As a result, in the late training period, when the difference between the predicted value and the ground truth is very small, the absolute value of the derivative of the loss to the predicted value is still 1. However, if the learning rate remains unchanged, the loss function will fluctuate near the stable value and it is difficult to continue to converge to achieve higher accuracy.

PS: The optimization method of learning rate attenuation also takes this problem into account to reduce the difficulty of convergence and avoid large horizontal adjustment at the end of training.

Finally, observation (6) shows that when the Smooth L1 Loss is small, that is, in the interval of [-1,1], the gradient of x will also become smaller. And when x is very large, the absolute value of the gradient to x reaches the upper limit of 1, but is not so large as to destroy the network parameters. The defects of L1 loss and L2 loss are perfectly avoided.

Among them, the three Loss curves are shown in the following figure:

The optimizedSmooth L1 LossThe function image is as follows:

As can be seen from the figure:

  • It’s far from the origin of the coordinates, greater than 1 and less than negative 1L1 lossVery close;
  • And near the origin, [-1,1], the transition is very smooth, unlikeL1 lossIt has a sharp Angle, so it’s calledSmooth L1 loss, a smoothL1 loss“, hence the name.

Finally, the gradient change curve of Smooth L1 loss on X is given as follows:

Loss in the regression task of the actual target detection box is

Where, the frame coordinates (x,y,w,h) of GT:

Predicted frame coordinates:

That is, the loss of 4 points is calculated respectively, and then added as Bounding Box Regression loss.

Disadvantages: When the above three kinds of Loss are used to calculate the Bounding Box Loss detected by the target, the Loss of four points can be independently calculated, and then the final Bounding Box Loss can be obtained by adding them. The assumption is that the four points are independent of each other. In fact, there is a certain correlation between them.

How do you connect this relationship? And that’s what we’re going to learn more about, IOU LOSS. Stay tuned and look forward to the next article.

Refer to the link: www.zhihu.com/question/58…

Reference links: zhuanlan.zhihu.com/p/104236411

Slow is fast. We have limited time today, so let’s learn one first. We’ll continue to update it, little by little, as it gets stronger. Welcome to leave messages and private messages. The public number of the same name: Small white CV, thank you for your attention