Code link
Paper links
Personal blog
The problem
This article begins by illustrating that NMS is a very important post-processing process for removing repetitive predicted results. Some work has found that it is better to use the IOU as a sort standard in the NMS. Here the author also uses an experiment to prove the correctness of the above statement.
The Ground truth IoU in the above table represents the IoU between the predicted boundary box and the baseline value as the standard of the score in NMS. As can be seen from the results of the above table, the accuracy of the model is greatly improved after using IoU to know NMS. In addition, some previous work has been done to use predictive IoU instead of predictive class Score as a scoring criterion for AN NMS. Most methods simply predict IoU directly by adding an IoU branch, but these methods have two problems:
-
Add an IoU prediction branch without extracting some features that are important for IoU prediction.
-
Another problem is that the IoU’s forecasts are misaligned.
The above results, shown in the figure above, show the distribution of the ioUs after refining and the distribution of the ioUs without refining. In the training, the IoU prediction branch was used between the feature of the proposal and the reference box, but in the test, the predicted value was treated as the IoU between the predicted boundary box and the reference box. The IoU distribution of the two is inconsistent, which brings a certain problem of misalignment. The following is the author’s solution to the above two problems.
The solution
There are no related characteristics of the problem
To solve these problems, the author proposes two models. The two models are called Attentive Corner Aggregation(ACA) and Corners Geometry Encoding(CGE) modules respectively. The authors used these two models to extract the features needed to predict IoU. As can be seen from the name, they are all about Corner. Here’s how these two modules work.
ACA module
The ACA module is introduced first. Before introducing the module, the author first introduces the motivation for designing the module.
As can be seen from the figure above, the visual part of the target observed from different angles is the same, which may be not conducive to feature extraction, and also is not conducive to feature extraction which is important for predicting IoU. Therefore, the author designed this module. According to the author, this module can reduce the difference of extracted features caused by different observation angles to a certain extent.
The specific design is as follows:
After generating the Proposal using PoineNet2 and generating the point-by-point semantic features, the authors used the above method to aggregate the points in the Proposal to generate the features of each Proposal. Similar to PointRCNN, the author also uses PointNet2 to extract the features of the Proposal. The difference is that, after stacking KKK SA layers, the author does not use FPS to sample points at the K+1K+1K+1 layer, but uses eight corner points of the Proposal as the sampled points. Then the points within the radius of the eight corners are searched, the features of these areas are extracted using PointNet, and then the attention mechanism is applied to these features. The specific attention mechanism is shown in the figure below:
Is to do the attention mechanism at different angles and different channel levels. And then the final feature is the sum of the features of these eight corners.
CGE module
The purpose of this module is to take advantage of the geometric features of the proposal. The above extracted features can be understood as semantic features. The result of this part is also very simple, that is, the world coordinates of the eight corners of the proposal are taken as the input of the neural network, and its specific structure is as follows:
About the IoU prediction misalignment
The design of this part is also very simple. See the following figure for specific operations. After making a refine IoU prediction, the boundary box of the prediction is repeated as a Proposal in the input Proposal module, so that the IoU branch finally predicts the IoU between the box after refine and the benchmark. This solves the problem of unalignment. It is worth noting that only the IoU branch changes when repeated, the rest of the branches remain unchanged. Otherwise, there will be problems with misalignment.
To solve the effect
This is the result of this article on the Kitti test set, and the result is just so-so. Why do some models have small differences between validation sets and test sets, while others have large differences? Is it a problem of method design?
Ablation experiments
IoU alignment is valid
The baseline here means that PointRCNN adds an IoU branch. Alignment × times× indicates that the predicted bbox is not sent to the network, ✓\checkmark✓ indicates that the predicted bbox is repeated, in addition, the author also makes an experiment about confidence Alignment. The results are as follows:
As can be seen from the above table, confidence alignment not only does not bring improvement, but also brings decline. The authors simply explain that confidence is not used for alignment operations. This explanation is a bit far-fetched, and the confidence level is also a characteristic of the proposed proposal.
Verify the validity of IoU feature modules
A comparative experiment of some operations in ACA module.