The overall idea of RepPointsV2 is similar to that of Mask R-CNN, adding more tasks to supervise the learning of target detection algorithm. Although it may not be novel in innovation, the universality of the paper is very good. Moreover, the output of corner task is used for joint reasoning, which has been improved a lot from the comparison experiment

Source: Xiaofei algorithm engineering Notes public account

RepPoints V2: Verification Meets Regression for Object Detection

  • ** Thesis Address: [arxiv.org/abs/[2007.0…]

] (arxiv.org/abs/2007.08…). 六四屠杀

  • Thesis Code:Github.com/Scalsol/Rep…

Introduction


Two methods of neural network prediction are verification and regression respectively. Some current target positioning methods combine these two methods to obtain SOTA performance. The preset Anchor box is verified first for initial positioning, and then the offset value of regression box is adjusted. Some recent targeting methods, such as RepPoints, have also achieved good performance using regression alone. Therefore, this paper studies whether verification tasks can be added to the positioning algorithm of pure regression to enhance performance. Specific implementation methods are to add auxiliary side-branches to the original network, conduct supervised learning, optimize intermediate features and final detection results.

A Brief Review of a Pure Regression Method: RepPoints


RepPoints uses the method of pure regression. For details, you can see the previous article. It starts from the position p=(x,y)p=(x,y)p=(x,y). Direct return a set of points R ‘= {PI’ = (yi xi ‘, ‘)} \ mathcal {R} ^ {‘} = \ {p _i = (x ^ ^ {‘} {‘} _i, y ^ {‘} _i) \} R ‘= {PI’ = (yi xi ‘, ‘)} to express spatial location of the target, a total of two consecutive steps:

R = {PI = (xi, yi)}, I = 1 n \ mathcal {R} = \ {p_i = (x_i y_i) \} ^ {n} _ {I = 1} R = {PI = (xi, yi)} for middle point set, I = 1 n FpF_pFp to position the eigenvector of PPP, Gig_igi and gi ‘g^{‘}_igi’ are 2-d regression functions, and the final bbox is obtained from the point set R\mathcal{R}R and R ‘\mathcal{R}^{‘}R’ by the transformation function T\mathcal{T}T. Although RepPoints use pure regression method without anchor verification step, its performance is not worse than that of anchor-based method.

Corner Point Verification


Corner point verification is proposed by CornerNet, which is one of the verification methods adopted in this paper. It is used in many keypoint-based detection algorithms. Corner point verification predicts a score for each feature map position, which is used to judge whether the feature point is the upper left corner point or the lower right corner point of the target. Then two offset values are predicted to adjust the diagonal points respectively to compensate for the accuracy problems caused by downsampling. The implementation of this paper is similar to that of the original one. Corner pooling is used for feature extraction, focal Loss training point score prediction and smooth L1 loss training offset value are used. If GT corner point is in bin of this feature point, it is considered that this feature point is a positive sample, and all other features are negative samples. When calculating the loss, the negative sample point closer to the positive sample point will be given a Gaussian score according to the distance, which can carry out smoother learning. Of course, the paper has made some improvements, GT corner points directly correspond to each layer of FPN, there is no need to determine the corresponding layer according to the size of the target.

Within-box foreground verification


Another verification task is to verify whether the feature points are inside the object. This foreground verification information exists evenly in the inner region of the object, unlike corner points, which are concentrated at the poles of the object. A category heat map is used for training. For CCC object categories, the category heat map has A total of CCC dimensional features, and each dimension represents the probability that the feature point belongs to this category. Similarly, GT directly corresponds to each layer of FPN. It’s important to note that classic Focal Loss may give larger objects more attention than smaller ones. For this reason, the paper proposes regularized Focal Loss. For positive sample points, regularize the loss value according to the feature points contained in the corresponding object, while for negative sample points, regularize the loss value by using the total number of positive sample points.

A General Fusion Method


Since the above two verification tasks are relative to the local parts of the object, and the target detection method of pure regression usually directly detects the whole object, the paper adopts the access in the form of auxiliary branches, as shown in Figure 1, to optimize the intermediate features and the final detection results respectively. Based on the auxiliary branch, the detector can obtain the following benefits:

  • Better features by multi-task learning, auxiliary verification tasks can provide richer supervision during training, so as to obtain more powerful features that can improve detection accuracy. Compared with Mask R-CNN, the verification task of the paper does not need additional labeling.
  • Feature enhancement for better regression, verify that the task output Feature graph contains corner position and foreground area, and the size of Feature graph is consistent with the size of FPN Feature graph used for regression. Therefore, they are added into FPN features after 1×11\times 11×1 conv processing. It should be noted that the backpropagation gradient of the trunk regression task is only transmitted to the Embed Conv layer, so as to avoid affecting the learning of the verification task.
  • The fusion of features can help target location implicitly, and the paper also explicitly utilizes the output of corner verification task to carry out Joint inference. Corner task is good at corner location, but not good at distinguishing whether it is true corner, while the main regression task has the opposite function. Therefore, the paper proposes to adjust the corner point PTP_TPT of the prediction box of the output of the main regression task:


t t
Is corner point type (upper left or lower right),
q t q^t
Is the predicted corner position,
s ( q t ) s(q^t)
To verify the score,
r r
Is the domain threshold, which defaults to 1.

RepPoints v2: Fusing Verification into RepPoints


To make RepPoints more compatible with the auxiliary validation task branch, the first two points of the point set are explicitly defined as explicit-corners and top-left points, and the predicted point set is transformed into a prediction box based on these two points.

The verification module takes the output of the third convolution layer of the locating subnet (refer to RepPoints for details) as the input of the verification task. The structure of the verification module is shown in Figure 2, and the complete training loss function is:

Experiments


Contrast the effect of explicit-corners with that of the original method.

Comparison of the effects of two verification tasks.

Comparison of the effects of three fusion methods.

Performance comparison with SOTA.

Conclusion


The overall idea of RepPointsV2 is similar to that of Mask R-CNN, adding more tasks to supervise the learning of target detection algorithm. Although it may not be novel in innovation, the universality of the paper is very good. Moreover, the output of corner task is used for joint reasoning, which has been improved a lot from the comparison experiment.





If this article was helpful to you, please give it a thumbs up or check it out

For more information, please pay attention to wechat official account [Algorithm Engineering Notes of Xiaofei]