Early target detection studies are anchored based, initial anchor is set and the modified value of anchor is predicted, which is divided into two-stage target detection and one-stage target detection, respectively represented by Faster R-CNN and SSD. Later, some researchers thought that the setting of the initial anchor had a great impact on the accuracy rate and it was difficult to find a perfect preset anchor, so they began to continuously study the anchor-free target detection algorithm, aiming to remove the link of preset anchor and make the network learn the position and shape of anchor by itself. In speed and accuracy above have very good performance. The anchor-free target Detection algorithm is divided into two types, one is the Dense Prediction type represented by DenseBox, which densely predicts the relative position of the frame, and the other is the Keypoint-Bsaed Detection type represented by CornerNet. It mainly detects the key points of the target. This paper mainly lists several networks of keypoint-based Detection type, mainly involving the following networks:

  • CornerNet
  • ExtremeNet
  • CenterNet
  • CenterNet(Object as Point)
  • CSP
  • CornerNet-Lite
  • RepPoints
  • CentripetalNet
  • SaccadeNet
  • RepPointsV2
  • CPNDet
  • FSAF

CornerNet


The CornerNet defines target detection as detection of the upper-left and lower-right points. The network structure is shown in Figure 1. The heat maps of the points at the upper left corner and the lower right corner are predicted through the convolutional network, and then the two heat maps are combined to output the prediction box, completely removing the need of Anchor box. Experiments also show that the CornerNet algorithm has comparable performance with the current mainstream algorithm, and it creates a new paradigm of target detection.

The CornerNet structure is shown in Figure 4. Hourglass network is used as the backbone network. Two sets of results are output by two independent prediction modules, corresponding to the upper left and lower right corner points respectively.

ExtremeNet


ExtremeNet conducts target positioning by detecting four poles of the target, as shown in Figure 1. The overall algorithm is improved based on the idea of CornerNet. Five heat maps are used to predict the four poles and the central region of the target respectively. The poles of different heat maps are combined. In addition, poles detected by ExtremeNet can cooperate with DEXTR network to predict target segmentation information.

CenterNet


CornerNet converts the currently commonly used anchor-based target detection into Keypoint-based target detection. Corner point pairs are used to represent each target. CornerNet mainly focuses on the boundary information of the target, and the lack of access to the internal information of the target is easy to cause false detection, as shown in Figure 1. In order to solve this problem, CenterNet is proposed in this paper. Based on corner point pairs, central key points are added to form triples for detection, which can capture both the boundary information of the target and the internal information of the target. In addition, in order to better extract features, center pooling and Cascade corner pooling are also proposed to better extract central key points and corner points respectively.

CenterNet(Object as Point)


CenterNet algorithm will detect the target as the key point, first find the center point of the target, and then regression its size. First, the input image is converted into a heat map. The peak points in the heat map correspond to the center of the target, and the feature vectors of the peak points are used to predict the height and width of the target, as shown in Figure 2. In reasoning, only a simple forward calculation is required, and no post-processing operations such as NMS are required. Compared with the CenterNet algorithm of the same name in the previous paper, this algorithm is more concise and has strong enough performance. It does not need post-processing methods such as NMS and can be extended to other detection tasks.

CSP


The network structure of CSP is roughly shown in Figure 1. The location of target center point and its corresponding size are predicted respectively on the trunk network. The overall idea of this paper is basically the same as CenterNet(Zhou. Etc.), but it is not plagiarized, because it is a paper presented at the same conference. CenterNet focuses on conventional object detection, while this paper focuses on face detection and pedestrian detection. However, CSP still needs to do NMS post-processing, which is less than CenterNet, but does not prevent us from a brief understanding, including the training method and parameters of the study paper.

CornerNet-Lite


CornerNet, as a classical method in keypoint-based target detection algorithm, has a good accuracy, but its reasoning is very slow, requiring about 1.1s/ piece. While it is possible to simply shrink the size of the input image to speed up reasoning, this greatly reduces its accuracy and performance compared to YOLOv3. For this purpose, two lightweight CornerNet variants are proposed: Cornernet-saccade and Cornernet-squeeze.

The cornernet-Saccade performs target detection in a small area of the possible target location. First, it predicts the attention feature image by reducing the complete image and obtains the preliminary position and size of the prediction box. Then, it intercepts the image area centered on this position on the high-resolution image for target detection.

In CornerNet, most of the computing time is spent on reasoning for the backbone network Hourglass-104. To this end, the Cornernet-Squeeze combined SqueezeNet and MobileNet to reduce the complexity of HourGlass-104 and designed a new lightweight Hourglass network.

RepPoints


Although the classical bounding box is beneficial to calculation, it does not take into account the shape and posture of the target, and the features obtained from the rectangular region may be seriously affected by the background content or other targets, and the low-quality features will further affect the performance of target detection. In order to solve the problems existing in bounding box, RepPoints, a new object representation method, is proposed in this paper, which can achieve finer granularity positioning ability and better classification effect.

RepPoints are a set of points that can adaptively surround a target and contain the semantic characteristics of a local region. Based on RepPoints, RPDet is designed, which contains two recognition stages. Because deformable convolution can sample multiple irregularly distributed points for convolution output, deformable convolution is very suitable for RepPoints scenarios and can guide sampling points according to the feedback of recognition results.

CentripetalNet


CornerNet opens a new way of target detection by detecting corner points for target location. When corner points are matched, extra embedding vector is added and corner points with small vector distance are matched. However, this method is not only difficult to train, but also lacks the location information of the target.

The core of CentripetalNet is to put forward a new Angle matching method, learn an extra centripetal offset value, offset after close enough Angle is matching. CentripetalNet contains four modules, as shown in Figure 2:

  • Corner Prediction Module: This part is used for generating candidate cornernets, just like the CornerNet.
  • Centripetal Shift Module: predicts the Centripetal Shift of angular points and groups the similar angular points according to the offset results.
  • Cross-star Deformable Convolution: Deformation Convolution for corner scenes can effectively enhance the features of corner positions.
  • Instance Mask Head: It is similar to MaskRCNN adding Instance segmentation branches, which can improve the performance of target detection and increase the Instance segmentation ability.

SaccadeNet


The structure of SaccadeNet is shown in Figure 2. First, the central position and corner position of the target are preliminarily predicted, and then regression optimization is carried out using the features of the four corner positions and the central position. The overall idea is similar to the two-stage target detection algorithm, which transforms the regional features of the second-stage prediction box precision call into point features. The overall idea is very good in terms of accuracy and speed.

RepPointsV2


The overall idea of RepPointsV2 is similar to that of Mask R-CNN. More tasks are added to supervise the learning of target detection algorithm. The specific implementation method is to add auxiliary side-branches to the original network for supervised learning. Auxiliary branches can optimize intermediate features and perform joint detection.

CPNDet


This paper was published by the author of CenterNet. The author believes that acNHOR-free method usually has a large number of false detections and requires an independent classifier to improve the accuracy of detection. Therefore, corner-proposal-Network (CPN) was proposed by combining acNHOR-free method and two-stage paradigm. The complete structure is shown in Figure 2. Firstly, the anchor free method is used to extract key points and traverse key points to combine them into candidate boxes. Finally, two classifiers are used to perform error detection filtering and label prediction respectively on candidate boxes.





If this article was helpful to you, please give it a thumbs up or check it out

For more information, please pay attention to wechat official account [Algorithm Engineering Notes of Xiaofei]