In view of the optimization of the anchor-Point detection algorithm, SAPD method is proposed in this paper, which applies different loss weights to Anchor point at different locations and carries out weighted common training for different feature pyramid layers, removing most of the man-made rules and more following the weight of the network itself
Source: Xiaofei algorithm engineering Notes public account
Soft Anchor-Point Object Detection
- Thesis Address:Arxiv.org/abs/1911.12…
- Thesis Code:Github.com/xuannianz/S… not official
Introduction
Anchor-free detection method is divided into two categories: Anchor- point category and key-point category. Compared with key-point category, Anchor- point category has the following advantages: 1) simpler network structure 2) faster training and reasoning speed 3) better use of feature pyramid 4) more flexible feature pyramid selection, but the accuracy of Anchor-point category is generally lower than that of key-point category. Therefore, this paper focuses on the factors that hinder the accuracy of anchor-point category and proposes SAPD(Soft Anchor-point Detecto), which mainly has the following two highlights:
- Soft – weighted anchor points. In the training of the Anchor-point algorithm, points meeting geometric relations are generally set as positive sample points, and the weight of their loss value is all 1, which results in occasionally higher classification confidence of points with less accurate positioning. In fact, the regression difficulty of points at different positions is different. The points closer to the target edge should have lower weight of loss value, so that the network can focus on the learning of high-quality Anchor point.
- Soft – selectedpyramid levels. The anchor-Point algorithm will select one layer of feature pyramid for training in each training round, while other layers are ignored, which causes waste to a certain extent. Although the response of other layers is not as strong as that of the selected layer, their feature distribution should be similar to that of the selected layer, so different weights can be assigned to multiple layers for training at the same time.
Detection Formulation with Anchor Points
Firstly, the paper introduces the network structure and training method of Anchor point target detection method.
Network architecture
Each layer of the feature pyramid contains a Detection head. The feature pyramid layer is labeled PlP_lPl and LLL is the number of layers. The size of the feature map of the layers is 1/ SL1 / s_L1 / SL times of the input W×HW\times HW×H. Sl = 2 ls_l = 2 ^ LSL = 2 l for stride. Generally, LLL ranges from 3 to 7. Detection Head includes classification subnets and regression subnets, which all start with 5 3×33\times 33×3 convolution layers. Then, KKK classification confidence and 4 offsets are predicted for each position respectively, and the offsets are the distance from the current position to the target boundary.
Supervision targets
For the target B = (c, x, y, w, h) B = (c, x, y, w, h) B = (c, x, y, w, h), Central area of Bv = (c, x, y, ϵ w, ϵ h) B_v = (c, x, y, \ epsilon w, \ epsilon h) Bv = (c, x, y, ϵ w, ϵ h), ϵ \ epsilon ϵ for scaling factor. When the target BBB is endowed with pyramid layer PlP_lPl and anchor point plijp_{lij}plij is located in BvB_vBv, plijp_{lij} PLij is considered to be a positive sample point and the classification target is CCC. Regression target is normalized distance d=(dl,dt, Dr,db)d=(d^ L,d ^t, d^r, d^b)d=(dl,dt, Dr,db), respectively, are the distance from the current position to the four boundaries of the target:
ZZZ is the normalized factor. For negative sample points, the classification target is background (C = 0C = 0C =0), and the positioning target is null, so no learning is required.
Loss functions
The network outputs the KKK dimension classification of plijp_{lij}plij at each point c^lij\hat{c}_{lij}c^lij and the 4-dimensional position regression output d^lij\hat{d}_{lij}d^lij, Focal Loss and IoU Loss were used for learning respectively:
The total loss of the network is the sum of positive and negative sample points divided by the number of positive sample points:
Soft Anchor-Point Detector
The core of SAPD is shown in Figure 3, soft-weighted Anchor Points and soft-selected Pyramid Levels, which are used to adjust the weight of Anchor point and train with multiple layers of feature Pyramid.
Soft-Weighted Anchor Points
-
False attention
Based on the traditional training strategy, the paper observed that the positioning accuracy of some Anchor point outputs was poor, but its classification confidence was high, as shown in FIG. 4A, which would cause that the most accurate positioning prediction result was not retained after NMS. The possible reason is that the training strategy treats the Anchor point in the central region BvB_vBv equally. In fact, the closer the point is to the target boundary, the more difficult it is to return to the accurate target position, so the loss value of different Anchor points should be weighted according to the position, so that the network can focus on the learning of high-quality anchor points, rather than forcing the network to learn the points that are difficult to return well.
-
Our solution
In order to solve the problems mentioned above, the paper proposes the concept of soft-weighting, adding a weight wlijw_{lij}wlij to the loss value of each Anchor point LlijL_{lij}Llij, which is determined by the position of the point and the boundary of the target. The negative sample point does not participate in the calculation of position regression, so it is directly set to 1, and the complete weight calculation is as follows:
FFF is a function of the distance between pliJP_ {lij}plij and the target BBB, This paper sets FFF as centerness function f(plij,B)=[min(dlijl, DLIjr)min(dliJT, dliJB) Max (dlijl,dlijr) Max (dliJT, dliJB)]ηf(p_{lij}, B)=[\frac{min(d^l_{lij}, d^r_{lij})min(d^t_{lij}, d^b_{lij})}{max(d^l_{lij}, d^r_{lij})max(d^t_{lij}, ^ ^ d b_ {lij})}] eta} {\ f (plij, B) = [Max (dlijl dlijr) Max (dlijt dlijb) min (dlijl dlijr) min (dlijt dlijb)] eta
η\etaη is the reduced amplitude, and the specific effect can be seen in Figure 3. After soft-weighted, the weight of Anchor point becomes a mountain peak.
Soft-Selected Pyramid Levels
-
Feature selection
The anchor-free method generally selects one layer of the feature pyramid for training in each round, and different layers have completely different effects. Through visualization, the paper found that activation regions of different layers are actually similar, as shown in Figure 5, which means that features of different layers can be predicted cooperatively. Based on the above findings, the paper suggests that there are two criteria for selecting the right pyramid layer:
-
Choose rules based on eigenvalues, not manual rules.
-
Multiple layers of features are allowed to be used to train each target, with each layer having a significant contribution to the predicted results.
-
Our solution
In order to meet the above two criteria, the paper proposes to use feature selection network to predict the weight of each layer to the target. The overall process is shown in Figure 6. RoIAlign is used to extract the features of the corresponding regions of each layer, and then input them into the feature selection network after merging, and then output the weight vector. The effect can be seen in Figure 3. The peaks of the pyramid have similar shapes but different heights for each level of weight. Note that the feature selection network is only used in the training phase.
The structure of feature selection network is very simple, as shown in Table 1. It is trained with detector, GT is one-hot vector, and the value is specified according to the method of minimum loss of FSAF. For details, please refer to the previous article on FSAF. So far, target BBB has been associated with each layer of pyramid by weighting wlBw^B_lwlB, combined with the previous soft-weighting, the weight of Anchor point is:
The loss of the complete model is the weighted loss of Anchor point plus the loss of feature selection network:
Experiment
The comparative experiment of each module.
Compare with SOTA algorithm.
Conclusion
In view of the optimization of the anchor-Point detection algorithm, SAPD method is proposed in this paper, which applies different loss weights to Anchor point at different locations and carries out weighted common training for different feature pyramid layers, removing most of the man-made rules and more following the weight of the network itself.
If this article was helpful to you, please give it a thumbs up or check it out
For more information, please pay attention to wechat official account [Algorithm Engineering Notes of Xiaofei]