ExtremeNet detects four poles of a target and then combines them geometrically for target detection, with performance comparable to other traditional detection algorithms. ExtremeNet’s detection method is very unique, but it contains many post-processing methods, so there is a lot of room for improvement. You can refer to the error analysis part in the experiment of the paper if you are interested

Source: Xiaofei algorithm engineering Notes public account

Bottom-up Object Detection by Grouping Extreme and Center Points

Thesis Address:Arxiv.org/abs/1901.08…
Thesis Code:Github.com/xingyizhou/…

Introduction

In target detection, the commonly used method defines the target as a rectangular box, which usually brings a lot of background information that hinders detection. For this reason, the ExtremeNet is proposed in this paper, which conducts target positioning by detecting four poles of the target, as shown in Figure 1. The overall algorithm is improved based on the idea of CornerNet. Five heat maps are used to predict the four poles and the central region of the target respectively. The poles of different heat maps are combined. In addition, poles detected by ExtremeNet can cooperate with DEXTR network to predict target segmentation information.

ExtremeNet for Object detection

ExtremeNet uses HourglassNet to detect class-knowable key points, and follows the training steps, loss function and offset value prediction of CornerNet, where the prediction of offset value is class-agnostic and the center point does not contain the offset value. The trunk network outputs a total of 5×C5\times C5×C heat map, 4×24\times 24×2 offset value characteristic map, and CCC is the category number. The overall structure and output are shown in Figure 3. When poles are extracted, they are combined according to geometry.

Center Grouping

As the poles are located in different directions of the target, the combination of the poles is very complicated. The thesis thinks that the global information is lacking when the poles are combined like the CornerNet, so the Center Grouping is proposed.

Center Grouping is a process shown in algorithm 1. The peak values on the heat map of the four poles are obtained. The peak values must be greater than the threshold τp\tau_pτp. After obtaining the peak points on each heat map, the combination of each peak point is traversed. For the peak point combination (TTT, BBB, RRR, LLL) that meets the geometric relationship, Calculate the geometric center of the c = c (lx + tx2, ty + by2) = (\ frac {l_x + t_x} {2}, \ frac {t_y + b_y} {2}) c = (lx + tx, 2 + by ty), Cx, if the value of geometric center meet Y ^ cy (c) or greater tau c \ hat {} Y ^ {(c)} _ {c_x, c_y} \ ge \ tau_cY ^ cx, cy tau c or higher (c), argues that the peak point combination conforms to the requirements.

Ghost box suppression

When three targets of the same size are equally spaced, Center Grouping can produce high confidence misjudgment. At this point, the target in the middle may have two cases, one is correct prediction, the other is wrong to merge the output with the object next door, the paper calls the prediction box of the second case as ghost box. In order to solve this situation, the paper adds a soft-NMS post-processing method. If the sum of the confidence degree of the prediction boxes contained in a certain prediction box is greater than three times, the confidence degree of the prediction box is divided by two, and then the NMS operation is performed.

Edge aggregation

Poles are sometimes not unique. If the target has horizontal or vertical boundaries, then all the points on the edge are poles, and the network’s prediction of points on such boundaries will be smaller, which may lead to missed detection of poles.

Edge aggregation is adopted in this paper to solve this scenario. For the local maximum points of the left and right heat maps, fraction aggregation is carried out in the vertical direction, while for the local maximum points of the upper and lower heat maps, fraction aggregation is carried out in the horizontal direction. The monotone decreasing fraction in the corresponding direction is aggregated until the local minimum point in the aggregation direction is reached. Assuming that MMM for local maximum points, Ni (m) = Y ^ mx + I, myN ^ {} (m) _i = \ hat {Y} _ {m_x + I, m_y} Ni (m) = Y ^ mx + I, my point for horizontal direction, Define i0< 0I_0 < 0I0 <0 and 0

Ni0 (m) N ^ {(m)} _ {} i_0-1 > N ^ {(m)} _ {i_0} Ni0-1 (m) > Ni0 (m) and Ni1 (m) + 1 (m) N < Ni1 ^ {(m)} _ {i_1} < N ^ {} (m) _ + 1} {i_1 Ni1 (m) < Ni1 + 1 (m), Then, the peak point value of edge aggregation is updated as Y ~ m = Y ^ m + lambda aggr ∑ I = i0i1Ni (m) \ tilde _m = {Y} \ hat {} Y _m + \ lambda_ {aggr} {\ sum} ^ {i_1} _ {I = i_0} N ^ {} (m) _iY ~ m = Y ^ m + lambda aggr ∑ I = i0i1Ni (m), λaggr\lambda_{aggr}λaggr is the aggregation weight and is set to 0.1. The overall effect is shown in Figure 4.

Extreme Instance Segmentation

Poles contain more target information than bBoxes, after all, twice as much label information (8 vs 4). Based on the four poles and bbox, this paper proposes a simple method to obtain the mask information of the target. Firstly, a line with 1/4 bbox boundary length is extended with the pole as the center. If the line exceeds bbox, it is truncated. Finally, the DEXTR(Deep Extreme Cut) method was used to further obtain mask information. The DEXTR network could convert pole information into segmentation information. Here, the octagon screenshot was directly input into the pre-trained DEXTR network.

Experiments

In addition, the thesis analyzes the errors of the ExtremeNet and replaces the output of each module with GT, finally reaching 86.0AP.

Compared with other SOTA methods.

Example segmentation effect.

Conclusion

ExtremeNet detects four poles of a target and then combines them geometrically for target detection, with performance comparable to other traditional detection algorithms. ExtremeNet’s detection method is very unique, but it contains many post-processing methods, so there is a lot of room for improvement. You can refer to the error analysis part in the experiment of the paper if you are interested.

If this article was helpful to you, please give it a thumbs up or check it out

For more information, please pay attention to wechat official account [Algorithm Engineering Notes of Xiaofei]

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

ExtremeNet: target detection through the poles, more detailed target area | CVPR 2019

Introduction