BorderDet’s core idea, BorderAlign, is clever and effective. BorderDet’s core idea, BorderAlign, integrates boundary features into target positioning prediction, and can be easily integrated into various target detection algorithms, which brings great performance improvement. BorderAlign was implemented with efficient CUDA in the open source implementation without much impact on time, and the overall work was solid

Source: Xiao Fei’s algorithm engineering notes public account

BorderDet: Border Feature for Dense Object Detection

  • Address:Arxiv.org/abs/2007.11…
  • Thesis code:Github.com/Megvii-Base…

Introduction


At present, most of the point-based target detection algorithms (SSD, RetinaNet, FCOS) use the single-point feature of the feature map for location and classification, but the single-point feature may not have enough information to express the complete instance and its boundary information. Many studies have supplemented the feature expression capability of single-point by various means. Although these methods can extract more features, they may bring unnecessary calculation and may be affected by background. Most importantly, none of these methods directly utilizes the boundary feature, which is very important for location. Therefore, this paper proposes a new feature extraction operation, BorderAlign, which can directly optimize the original single-point feature by using the boundary feature. Based on the BorderAlign, a SOTA target detection algorithm, BorderDet, is proposed. The main contributions of this paper are as follows:

  • The feature expression of the dense Object Detector is analyzed to show the importance of the boundary feature to the reinforcement of single-point feature.
  • A new feature extraction operation, BorderAlign, was proposed to optimize network features through boundary features, and a high-performance target detection algorithm, BorderDet, was proposed based on BorderAlign.
  • In the COCO data set, the integration of BAM module into FCOS and FPN improved 2.8AP and 3.6AP respectively, while the integration into RESnext-101-DCN achieved the accuracy of 50.3AP, which reached SOTA.

Our Approach


Motivation

In this paper, FCOS is the baseline, and the second stage prediction is added to compare the effect of various feature enhancement methods. Finally, it is found that the enhancement effect of boundary center point only is almost the same as that of region-based method. From the experimental results, the following conclusions can be drawn:

  • The point-based feature expression lacks the salient features of complete objects, so feature enhancement is needed.
  • Extracting features from complete boxes is unnecessary and redundant.
  • Efficient boundary feature extraction strategy can bring better performance.

Border Align

It is very inefficient to extract features intensively from the boundary. Generally, there are few points related to the target object on the boundary, most of which are background points. Therefore, the BorderAlign feature extraction operation is proposed, which can extract the boundary feature efficiently and adaptively.

According to the idea of R-FCN, BorderAlign extracts the (4+1)C(4+1)C(4+1)C dimension border-sensitive feature as input, 4C4C4C dimension corresponds to 4 edges, and CCC dimension corresponds to the original single-point feature. Assuming that the sequence of feature maps is (single point, left border, top border, Right border and bottom border), when calculating the output feature map, For each edge of the bbox corresponding to the point (I,j)(I,j)(I,j)(I,j), take NNN sample points uniformly, and NNN is set to 10 by default. The values of the sample points are calculated by bilinar interpolation. Finally, the output is obtained by channel-wise max-pooling:

(x0, y0, x1, y1) (x_0 y_0, x_1, y_1) (x0, y0, x1, y1) for point (I, j) (I, j) (I, j) predict bbox, this method can adaptively extract from border pole boundary characteristics.

In this paper, the maximum values of each channel in the border-sensitive characteristic graph are visualized, and it is found that the maximum distribution basically conforms to the preset function.

Border Alignment Module

The BAM module encapsulating the BorderAlign operation generates a border-sensitive feature map using the 1×11\times 11×1 convolution. Then, combined with the initial Bbox predictions, the border enhanced feature maps are output via the BorderAlign. Then use 1×11\times 11×1 convolution recovery as the module input dimension.

BorderDet

BorderDet is based on the FCOS detection architecture, with the BAM module mainly included in the prediction head of the feature pyramid. First, the preliminary Bbox prediction and the preliminary classification prediction are predicted, and then the preliminary Bbox prediction is input into the BAM module to get the Border classification prediction and Border Bbox prediction. The prediction uses 1×11\times 11×1 convolution as before, and finally the two results are output.

BorderRPN

BAM module can also be used in the two-stage target detection algorithm. The original region-based feature extraction in the second stage is changed to the BAM module in this paper for boundary feature enhancement prediction. In addition, the convolution of the previous feature extraction is changed to void convolution to enhance the perceptual domain.

Model Training and Inference

  • Target Assignment

BorderDet makes a preliminary prediction based on FCOS. In the second stage, GT is assigned to each IoU with a preliminary prediction result greater than 0.6. The regression objective is defined as:

σ\sigmaσ is variance, which is used to improve the efficiency of multi-task learning, and the default value is 0.5.

  • Loss Function

The loss function of BorderDet is defined as:

LclsC\mathcal{L}^C_{CLS}LclsC \mathcal{L}^C_{reg}LregC and LregC\mathcal{L}^C_{reg}LregC are primary classification loss and primary positioning loss, which are focal loss and IoU loss, respectively. LclsB\mathcal{L}^B_{CLS}LclsB \mathcal{L}^B_{CLS}LregB \mathcal{L}^B_{reg}LregB \mathcal{L}^B_{reg}LregB \mathcal{L}^B_{reg}LregB \mathcal{L}^B_{reg}LregB \mathcal{L}^B_{reg}LregB In implementation, they are FOCAL loss and L1\mathcal{L}_1L1 loss respectively.

  • Inference

BorderDet produces a direct multiplication output of the two classification results in inference, while for positioning, a reverse conversion of Formula 2 is performed on the bbox of the initial positioning using border positioning prediction, and an NMS output is performed on all results with an IoU threshold of 0.6.

Experiments


The BAM module is compared with the BorderAlign parameter.

BorderAlign is a fast and efficient CUDA implementation compared to other feature enhancement methods.

Combines the BorderDet directly into the one-stage and two-stage detectors.

Compare with the mainstream detection algorithms.

Conclustion


BorderDet’s core idea, BorderAlign, is clever and effective. BorderDet’s core idea, BorderAlign, integrates boundary features into target positioning prediction, and can be easily integrated into various target detection algorithms, which brings great performance improvement. BorderAlign was implemented with efficient CUDA in the open source implementation without much impact on time, and the overall work was solid.





If this post helped you, please give it a like or watch it again

More content please pay attention to the wechat public account