Links to papers: arxiv.org/abs/1902.09…
Code link: github.com/generalized…
This paper, from CVPR2019, is one of the few accepted and published papers on target detection. This paper proposes a new way to optimize the boundary box — GIoU(generalized IoU). At present, new uses of IOU are emerging in an endless stream, from Cascade R-CNN to IOU Net and now GIoU. GIoU’s method is relatively simple in these papers. I believe that many friends, after understanding the principle of this article, will think of OS as “I always think loss function can be optimized, why didn’t I think of it so simple? Haha, that’s what I thought anyway. Take a look at the methods proposed in this article.
motivation
At present, the mainstream boundary box optimization in target detection adopts the regression loss of BBox (MSE Loss, L1-Smooth Loss, etc.). These methods calculate the loss value using the “proxy attribute” of the detection frame — distance, while ignoring the most significant property of the detection frame itself — IoU. As shown in the figure below, when L1 and L2 norms are of the same value, in fact, the detection effects are greatly different, which is directly reflected in the great change of IoU values of the predicted and real detection frames, indicating that L1 and L2 norms cannot reflect the detection effects well.
In addition to reflecting the detection effect of the predicted detection frame and the real detection frame, IoU also has scale invariance. However, since IOU is so good, why not directly use IOU before? This is because IOU has two disadvantages, which make it not suitable for loss function:
- If there is no overlap between the detection subrack and GT, the IoU is 0. When you optimize the loss function, the gradient is 0, which means you can’t optimize
- When the IoU between the detection frame and GT is the same, the detection effect is also greatly different, as shown in the following figure:
Based on the excellent characteristics of IoU and its fatal shortcoming as a loss function, a new concept GIoU is proposed
methods
The GIoU definition is shown below,
By definition, a GIoU has the following properties:
- GIoU has excellent properties as a metric. Nonnegativity, identity, symmetry, and properties of the triangle inequality
- Similar to IoU, it has scale invariance
- The value of GIoU is always less than that of IoU
- For two rectangular boxes A and B, 0≤IoU(A,B)≤1, and -1≤GIoU≤1
- When A and B are not well aligned, the area of C will increase and the value of GIoU will decrease. However, when the two rectangular boxes do not coincide, GIoU can still be calculated, which solves the reason why IoU is not suitable as A loss function to some extent
GIoU as loss function is calculated in the following way: algorithm 2
It can be seen from the algorithm that the calculation method of and GIoU is basically the same as the procedure of IoU. After obtaining the value of IoU, calculate the value of GIoU according to algorithm 1 above. It’s not clear how the gradient is calculated for directional propagation. I’ll update it when I look at the source code
The experiment
The experiments are carried out on several mainstream target detection algorithms, namely YoLo, Faster R-CNN and Mask R-CNN. The experimental results, posted here on the Pascal Voc dataset, are shown below
Experimental results can be seen from YoLo V3 that GIoU’s loss function is significantly improved compared with IoU’s, while there is little difference between GIoU and IoU as loss function in Faster R-CNN. Here, the author explains that The anchor of Faster RCNN is more dense. As a result, it is difficult to create a detection frame that does not overlap with GT. In fact, IN my opinion, there are more detection frames that do not overlap with GT when there are more anchors. The more fundamental reason should be that most detection frames that do not overlap with GT are filtered out after a rough inspection of RPN network. The loss function of GIoU is not significantly improved compared to IoU
conclusion
GIoU’s method is simple and clever in optimizing the points. It is interesting to replace bbox regression with generalized IoU as a loss function. However, the confusion is that the detection AP values in the experimental results are very low, and the detection effect of the native Faster RCNN on PASCAL VOC is not so bad. Compared with the original loss function, the accuracy of GIoU’s loss function is less than 40%. But what about baseline, which is more accurate? This still needs to be tested experimentally. In addition, I always feel that this paper is a bit of a stopgap, and there are no more experiments to verify the reasons for the defects of Bbox as a loss function.
Welcome to follow my public number