Target detection (1) RCNN detail — the first work of deep learning target detection (2) SPP Net — let convolution computation share (3) Target detection (3) Fast RCNN — let RCNN model can train end-to-end (4) target detection Faster RCNN — RPN network to replace Selective Search [Target detection (5)] YOLOv1 — start the chapter of one-stage target detection [Target detection (6)] YOLOv2 — Introduce Anchor, Better, Faster, Understand the regression box loss function of target detection — IoU, GIoU, DIoU, CIoU principle and Python code Focal Loss Puts One-stage algorithm to the peak Focal Loss Focal Loss puts One-stage algorithm to the peak Focal Loss Focal Loss NMS Free [Target Detection (12)] FCOS — Target detection of Anchor Free based on the idea of instance segmentation

Target detection includes two types of tasks: one is the classification task, which is to classify the identified target; the other is the regression task of bbox, which is localization, requiring loss regression of the predicted boundary box. This paper mainly elaborated the design idea and relevant code realization of regression box Loss function in current mainstream target detection algorithms, including L2 Loss, Smooth L1 Loss, IoU Loss, GIoU Loss, DIoU Loss and CIoU Loss.

1. Smooth L1 Loss

In the Faster RCNN, the offset offset of the predicted boundary box is regression, where the offset offset is defined as follows:

In this paper, Smooth L1 Loss is used to perform regression for the above offset, and the calculation method is as follows:

Smooth L1 is a combination of L1 and L2, combining the advantages of L1 and L2. L2 Loss is used in a certain interval close to 0, and L1 Loss is used outside this interval. It may be better understood by looking at the following image:

Benefits of using Smooth L1:

  • The disadvantage of L1 Loss is that it cannot be differentiated at 0, which leads to a small difference between the predicted value and the ground truth in the later training period. The absolute value of derivative of L1 Loss to the predicted value is still 1, while if the learning rate remains unchanged, the Loss function will fluctuate near the stable value. It is difficult to continue convergence to achieve higher accuracy.
  • The disadvantage of L2 Loss is that when X is very large, Loss is also very large, which is easy to cause unstable training.
  • The advantage of Smooth L1 is that when the difference between the prediction box and the ground truth is too large, the gradient value will not be too large, which is more stable to outlier and avoids gradient explosion. When the difference between the prediction box and ground truth is very small, the gradient value is small enough.

Smooth L1 Loss version with Sigma:

Code implementation:

def smooth_l1_loss(x, target, beta, reduce=True, normalizer=1.0) :
    diff = torch.abs(x - target)
    loss = torch.where(diff < 1 / beta, 0.5 * beta * (diff ** 2), diff - 0.5 / beta)

    if reduce:
        return torch.sum(loss) / normalizer
    else:
        return torch.sum(loss, dim=1) / normalizer
Copy the code

This implementation is the same as the result of calling Torch directly. Nn. SmoothL1Loss in Torch.

2. L2 Loss

In yOLOv1-YOLOv3 series, the author of the paper calculated the error by the method of L2 error balance and sum. Taking YOLOv3 as an example, the offset deviation of (x, y, w, h) was regression, as shown in the figure below:


{ sigma ( t x p ) = b x C x . sigma ( t y p ) = b y C y t w p = l o g ( w p w a ) . t h p = l o g ( h p h a ) t x g = g x f l o o r ( g x ) . t y g = g y f l o o r ( g y ) t w g = l o g ( w g w a ) . t h g = l o g ( h g h a ) \ begin {cases} sigma (t_x ^ p) = b_x – C_x, sigma (t_y ^ p) = b_y – C_y \ \ t_w ^ p = log (\ frac {w_p} {w_a ‘}), t_h^p = log(\frac{h_p}{h_a’})\\ t_x^g = g_x – floor(g_x), t_y^g = g_y – floor(g_y)\\ t_w^g = log(\frac{w_g}{w_a’}), t_h^g = log(\frac{h_g}{h_a’}) \end{cases}

Disadvantages of L1, L2, Smooth L1 as target detection regression Loss:

  • The coordinates calculated the loss of X, y, W and H respectively, and treated as four different objects. The four parts of the BBox are supposed to be discussed as a whole, but are treated independently.
  • Sensitive to scale, the prediction box and the real box with different prediction effects may produce the same loss.

3. IOU Loss

3.1 Principles of IOU Loss

IOU Loss is a Loss function calculation method of the boundary box proposed by Megvii in UnitBox. L1, L2 and Smooth L1 Loss calculate the Loss of the four points of Bbox and add them up, without considering the correlation between coordinates. As shown in the figure below, the black box is the real box, while the green box is the prediction box. Obviously, the prediction effect of the third box is better, but the three boxes have the same L2 loss, which is obviously unreasonable.

IOU Loss regards the bbox composed of 4 points as a whole for regression, and the design idea is as follows:

The algorithm flow is as follows:

For those objects whose prediction frames are real, calculate the intersecting partial area and combined area of the prediction frames and real frames, divide and take -log, and then IOU Loss can be obtained. When the degree of overlap between the prediction box and the real box is higher, loss tends to be 0; otherwise, loss is greater. Such loss function design is reasonable.

3.2 IOU Loss code implementation

The code implementation is as follows:

def iou_loss(pred, target, reduction='mean', eps=1e-6) :
    """ preds:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] reduction: "mean" or "sum" return: loss """
    Select pred target area
    pred_widths = (pred[:, 2] - pred[:, 0] + 1.).clamp(0)
    pred_heights = (pred[:, 3] - pred[:, 1] + 1.).clamp(0)
    target_widths = (target[:, 2] - target[:, 0] + 1.).clamp(0)
    target_heights = (target[:, 3] - target[:, 1] + 1.).clamp(0)
    pred_areas = pred_widths * pred_heights
    target_areas = target_widths * target_heights

    Pred, target intersecting area
    inter_xmins = torch.maximum(pred[:, 0], target[:, 0])
    inter_ymins = torch.maximum(pred[:, 1], target[:, 1])
    inter_xmaxs = torch.minimum(pred[:, 2], target[:, 2])
    inter_ymaxs = torch.minimum(pred[:, 3], target[:, 3])
    inter_widths = torch.clamp(inter_xmaxs - inter_xmins + 1.0.min=0.)
    inter_heights = torch.clamp(inter_ymaxs - inter_ymins + 1.0.min=0.)
    inter_areas = inter_widths * inter_heights

    # for iou
    ious = torch.clamp(inter_areas / (pred_areas + target_areas - inter_areas), min=eps)
    if reduction == 'mean':
        loss = torch.mean(-torch.log(ious))
    elif reduction == 'sum':
        loss = torch.sum(-torch.log(ious))
    else:
        raise NotImplementedError

    return loss
Copy the code

3.3 Advantages and disadvantages of IOU Loss

Advantages:

  • IOU Loss indicates the fitting effect of the prediction box and the real box.
  • IOU Loss has scale invariance and is not sensitive to scale.

Disadvantages:

  • There is no way to measure the loss of two completely disjoint boxes (iOU is fixed at 0).
  • Two prediction boxes of different shapes may produce the same Loss (same IOU).

4. GIOU Loss

4.1 Principles of GIOU Loss

GIOU’s original intention is to solve the problem of IOU Loss (IOU is constant 0 when the prediction box and the real box do not intersect), and to design a set of Generalized Intersection over Union Loss. On the basis of IOU, GIOU also needs to find the smallest enclosing rectangle of the prediction box and the real box, and then calculate the area of the smallest enclosing rectangle minus the union of the two prediction boxes (the area of the purple backslash area in the figure below), and define GIOU as the difference between IOU and the area just calculated.

Define GIOU Loss = 1-giou. Notice that the range of GIOU is [-1, 1], then the range of GIOU Loss is [0, 2]. The algorithm process of GIOU Loss is as follows:

4.2 GIOU Loss Code Implementation

If you’re confused, just look at the code. It’s not complicated.

def giou_loss(pred, target, reduction='mean', eps=1e-6) :
    """ preds:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] reduction: "mean" or "sum" return: loss """
    Select pred target area
    pred_widths = (pred[:, 2] - pred[:, 0] + 1.).clamp(0)
    pred_heights = (pred[:, 3] - pred[:, 1] + 1.).clamp(0)
    target_widths = (target[:, 2] - target[:, 0] + 1.).clamp(0)
    target_heights = (target[:, 3] - target[:, 1] + 1.).clamp(0)
    pred_areas = pred_widths * pred_heights
    target_areas = target_widths * target_heights

    Pred, target intersecting area
    inter_xmins = torch.maximum(pred[:, 0], target[:, 0])
    inter_ymins = torch.maximum(pred[:, 1], target[:, 1])
    inter_xmaxs = torch.minimum(pred[:, 2], target[:, 2])
    inter_ymaxs = torch.minimum(pred[:, 3], target[:, 3])
    inter_widths = torch.clamp(inter_xmaxs - inter_xmins + 1.0.min=0.)
    inter_heights = torch.clamp(inter_ymaxs - inter_ymins + 1.0.min=0.)
    inter_areas = inter_widths * inter_heights

    # for iou
    unions = pred_areas + target_areas - inter_areas
    ious = torch.clamp(inter_areas / unions, min=eps)

    Find the smallest enclosing rectangle
    outer_xmins = torch.minimum(pred[:, 0], target[:, 0])
    outer_ymins = torch.minimum(pred[:, 1], target[:, 1])
    outer_xmaxs = torch.maximum(pred[:, 2], target[:, 2])
    outer_ymaxs = torch.maximum(pred[:, 3], target[:, 3])
    outer_widths = (outer_xmaxs - outer_xmins + 1).clamp(0.)
    outer_heights = (outer_ymaxs - outer_ymins + 1).clamp(0.)
    outer_areas = outer_heights * outer_widths

    gious = ious - (outer_areas - unions) / outer_areas
    gious = gious.clamp(min= -1.0.max=1.0)
    if reduction == 'mean':
        loss = torch.mean(1 - gious)
    elif reduction == 'sum':
        loss = torch.sum(1 - gious)
    else:
        raise NotImplementedError
    return loss
Copy the code

4.3 Advantages and disadvantages of GIOU Loss

Advantages:

  • GIOU Loss solves the problem of disjunction of IOU Loss and achieves higher accuracy in target detection tasks.

Disadvantages:

  • It is impossible to measure the box regression Loss with inclusion relation, as shown in the figure below. The three regression boxes have the same GIOU Loss, but obviously the regression effect of the third box is better.

5. DIOU Loss

5.1 Principles of DIOU Loss

In order to solve the problem that GIOU Loss cannot measure the Loss of two completely contained frames, DIOU Loss adds the distance between the center of the two frames to the Loss function, and replaces the area ratio in GIOU Loss with the square of the distance between the center of the frame/the diagonal of the smallest outside rectangle (length of red line/length of blue line).

DIOU calculation formula:

DIOU Loss calculation formula:

5.2 Implementation of DIOU Loss Code

def diou_loss(pred, target, reduce='mean', eps=1e-6) :
    """ preds:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] reduction: "mean" or "sum" return: loss """
    Select pred target area
    pred_widths = (pred[:, 2] - pred[:, 0] + 1.).clamp(0)
    pred_heights = (pred[:, 3] - pred[:, 1] + 1.).clamp(0)
    target_widths = (target[:, 2] - target[:, 0] + 1.).clamp(0)
    target_heights = (target[:, 3] - target[:, 1] + 1.).clamp(0)
    pred_areas = pred_widths * pred_heights
    target_areas = target_widths * target_heights

    Pred, target intersecting area
    inter_xmins = torch.maximum(pred[:, 0], target[:, 0])
    inter_ymins = torch.maximum(pred[:, 1], target[:, 1])
    inter_xmaxs = torch.minimum(pred[:, 2], target[:, 2])
    inter_ymaxs = torch.minimum(pred[:, 3], target[:, 3])
    inter_widths = torch.clamp(inter_xmaxs - inter_xmins + 1.0.min=0.)
    inter_heights = torch.clamp(inter_ymaxs - inter_ymins + 1.0.min=0.)
    inter_areas = inter_widths * inter_heights

    # for iou
    unions = pred_areas + target_areas - inter_areas + eps
    ious = torch.clamp(inter_areas / unions, min=eps)

    Find the minimum diagonal distance of the enclosing rectangle
    outer_xmins = torch.minimum(pred[:, 0], target[:, 0])
    outer_ymins = torch.minimum(pred[:, 1], target[:, 1])
    outer_xmaxs = torch.maximum(pred[:, 2], target[:, 2])
    outer_ymaxs = torch.maximum(pred[:, 3], target[:, 3])
    outer_diag = torch.clamp((outer_xmaxs - outer_xmins + 1.), min=0.) * *2 + \
        torch.clamp((outer_ymaxs - outer_ymins + 1.), min=0.) * *2 + eps

    Select the center distance between pred and target
    c_pred = ((pred[:, 0] + pred[:, 2) /2, (pred[:, 1] + pred[:, 3) /2)
    c_target = ((target[:, 0] + target[:, 2) /2, (target[:, 1] + target[:, 3) /2)
    distance = (c_pred[0] - c_target[0] + 1.) * *2 + (c_pred[1] - c_target[1] + 1.) * *2

    # please diou loss
    dious = ious - distance / outer_diag
    if reduce == 'mean':
        loss = torch.mean(1 - dious)
    elif reduce == 'sum':
        loss = torch.sum(1 - dious)
    else:
        raise NotImplementedError

    return loss
Copy the code

5.3 Advantages and disadvantages of DIOU Loss

Advantages:

  • DIOU Loss solves the problem that GIOU Loss cannot measure Loss under the condition of complete inclusion relation, and can further achieve higher accuracy in target detection task.

Disadvantages:

  • It is impossible to measure the Loss caused by the close center points of two frames with the same area but different shapes, as shown in the figure below. The center points of the two frames coincide, and the two predicted red frames have the same area but different shapes, so the DIOU Loss of the two boxes is the same, but obviously the latter has better fitting effect.

6. CIOU Loss

6.1 Principles of CIOU Loss

CIOU Loss and DIOU Loss are proposed in the same article. On the basis of DIOU Loss, CIOU Loss takes into account whether the shape of the prediction frame (aspect ratio) is consistent with the real frame, which is a very good supplement to DIOU Loss.

Notice that in the new αv, the larger IOU is, the smaller the denominator will be. The larger α is, the higher the proportion of aspect ratio will be. In this way, the overlap area, center distance, and frame shape are all integrated into a Loss function.

6.2 CIOU Loss Code Implementation

def ciou_loss(pred, target, reduce='mean', eps=1e-6) :
    """ preds:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] reduction: "mean" or "sum" return: loss """
    Select pred target area
    pred_widths = (pred[:, 2] - pred[:, 0] + 1.).clamp(0)
    pred_heights = (pred[:, 3] - pred[:, 1] + 1.).clamp(0)
    target_widths = (target[:, 2] - target[:, 0] + 1.).clamp(0)
    target_heights = (target[:, 3] - target[:, 1] + 1.).clamp(0)
    pred_areas = pred_widths * pred_heights
    target_areas = target_widths * target_heights

    Pred, target intersecting area
    inter_xmins = torch.maximum(pred[:, 0], target[:, 0])
    inter_ymins = torch.maximum(pred[:, 1], target[:, 1])
    inter_xmaxs = torch.minimum(pred[:, 2], target[:, 2])
    inter_ymaxs = torch.minimum(pred[:, 3], target[:, 3])
    inter_widths = torch.clamp(inter_xmaxs - inter_xmins + 1.0.min=0.)
    inter_heights = torch.clamp(inter_ymaxs - inter_ymins + 1.0.min=0.)
    inter_areas = inter_widths * inter_heights

    # for iou
    unions = pred_areas + target_areas - inter_areas + eps
    ious = torch.clamp(inter_areas / unions, min=eps)

    Find the minimum diagonal distance of the enclosing rectangle
    outer_xmins = torch.minimum(pred[:, 0], target[:, 0])
    outer_ymins = torch.minimum(pred[:, 1], target[:, 1])
    outer_xmaxs = torch.maximum(pred[:, 2], target[:, 2])
    outer_ymaxs = torch.maximum(pred[:, 3], target[:, 3])
    outer_diag = torch.clamp((outer_xmaxs - outer_xmins + 1.), min=0.) * *2 + \
        torch.clamp((outer_ymaxs - outer_ymins + 1.), min=0.) * *2 + eps

    Select the center distance between pred and target
    c_pred = ((pred[:, 0] + pred[:, 2) /2, (pred[:, 1] + pred[:, 3) /2)
    c_target = ((target[:, 0] + target[:, 2) /2, (target[:, 1] + target[:, 3) /2)
    distance = (c_pred[0] - c_target[0] + 1.) * *2 + (c_pred[1] - c_target[1] + 1.) * *2

    Find the loss on the shape of the prediction box
    w_pred, h_pred = pred[:, 2] - pred[:, 0], pred[:, 3] - pred[:, 1] + eps
    w_target, h_target = target[:, 2] - target[:, 0], target[:, 3] - target[:, 1] + eps
    factor = 4 / (math.pi ** 2)
    v = factor * torch.pow(torch.atan(w_pred / h_pred) - torch.atan(w_target / h_target), 2)
    alpha = v / (1 - ious + v)

    # please ciou loss
    cious = ious - distance / outer_diag - alpha * v
    if reduce == 'mean':
        loss = torch.mean(1 - cious)
    elif reduce == 'sum':
        loss = torch.sum(1 - cious)
    else:
        raise NotImplementedError

    return loss
Copy the code

7. Summary and effect of regression loss function of target detection frame

A good positioning loss should consider the following three factors:

  • Overlapping area
  • Center distance
  • Aspect ratio

As shown in the figure below, performance effects of Loss function on YOLOv3 can be observed that IOU Loss, GIOU Loss, DIOU Loss and CIOU Loss have certain accuracy improvement effects in sequence:

Reference:

  1. Arxiv.org/pdf/1608.01…
  2. giou.stanford.edu/GIoU.pdf
  3. Arxiv.org/pdf/1911.08…