Abstract: In this paper, the basic knowledge of target detection algorithm is briefly reviewed for everyone to learn and view.

This article is shared by Lutianfei from Target Detection Basics in Huawei Cloud community.

We are already familiar with the task of image classification, which is the algorithm to classify objects. And today we’re going to look at another problem in neural network construction, which is target detection. This means that not only do we use an algorithm to determine if a car is in the picture, but we also mark its position in the picture, enclosing the car with a border or red box, which is the object detection problem. In this paper, the basic knowledge of target detection algorithm is briefly reviewed for everyone to learn and view.

Basic knowledge of target detection

The stages of the network

  • Two-stage: the first-level network is used for candidate region extraction; The second level network classifies and accurately coordinates the extracted candidate regions, such as the RCNN series.

  • One-stage: the step of candidate region extraction is abandoned, and the classification and regression tasks are completed only with the first-level network, such as YOLO and SSD.

Why is a one-phase network inferior to a two-phase network

Because there is an imbalance between positive and negative cases in training.

  • There are too many negative cases and too few positive cases, and the loss caused by negative cases completely overwhelms the positive cases.

  • Most negative cases are easy to distinguish and the network cannot learn useful information. If there are a large number of such samples in the training data, the network will be difficult to converge.

How can two-stage network solve the disequilibrium in training

  • In THE RPN network, the most likely candidate regions are selected according to the prospect confidence, so as to avoid a large number of easily distinguishable negative cases.

  • During the training, samples were taken according to the intersection ratio, and the proportion of positive and negative samples was set as 1:3 to prevent excessive negative cases.

Common data set

Pascal VOC data set

Two versions, 2007 and 2012, provide a dataset containing 20 categories of objects.

PASCALVOC has five main missions:

① Classification: For each category, judge whether it exists in the test photos (20 categories in total);

(2) Detection: detect the position of the target object in the picture to be tested and boundingbox coordinates are given;

③ Segmentation: For any pixel in the photo to be tested, judge which category contains the pixel (if none of the 20 categories contains the pixel, then the pixel belongs to the background);

(4) human movement recognition (in the case of a given rectangular frame position);

⑤LargeScaleRecognition (hosted by ImageNet).

Import the corresponding. XML file of the image, and each target of each image in the annotation file corresponds to a dict

  • The attribute ‘boxes’

  • The attribute ‘gt_classes’

  • The attribute ‘gt_overlaps’

  • The attribute ‘flipped’

  • The attribute ‘seg_areas’

CoCo data set

There are three versions: 2014, 2015 and 2017

Unified management of data annotation information is performed in the Annotations folder. For example, the detection and segmentation annotation file of Train2014 is instances_train2014.json

Objectinstances (target instances), ObjectKeypoints (target keypoints), imagecaptions(look and talk) three types of annotations

Common evaluation indicators

True positives (TP) : The number of positives that are correctly classified as positive.

False positives (FP) : The number of False positives (FP) : the number of positive positives that were actually negative.

False negatives (FN) to communicate the number of negatives that are incorrectly classified as negative cases; that is, the number of negatives that are actually positive cases when the negatives are assigned.

True negatives (TN) to communicate the number of negatives that are correctly divided as negative cases, and the number of instances that are actually negative and divided as negative cases.

Precision = TP/(TP+FP)= TP/ All data predicted by the model as positive examples

Recall =TP/ (TP+FN) =TP/ All data whose real categories are positive examples

PR curve

We hope that the test results of P as high as possible, R as high as possible, but in fact, the two are contradictory in some cases.

So what we need to do is find a balance between accuracy and recall. One method is to draw a PR curve, and then use the area under the PR curve AUC (AreaunderCurve) to judge the quality of the model.

Ious indicators

IoU is the ratio of intersection and union of prediction box and ground truth.

For each class, the overlapping area of prediction box and ground truth is intersection, and the total area across is union.

PR in target detection

Calculation method of mAP in VOC

Through PR curve, we can get the corresponding AP value:

Prior to 2010, the PASCALVOC contest defined AP this way:

  • First, the model prediction results should be sorted (that is, in descending order of confidence of each predicted value).

  • We divide recall values from 0 to 1 into 11 fractions: 0, 1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0.

  • In each recall interval (0-0.1,0.1-0.2, 2-0.3… , 0.9-1.0), then we calculate the maximum accuracy, and then calculate the sum and average of these maximum accuracy, which is the AP value.

Since 2010, PASCALVOC competition has replaced these 11 Recall points with all recall data points in the PR curve.

For a certain recall value r, the precision value is the maximum value of all recall>= R (this ensures that the p-R curve is monotonously decreasing and avoids the curve swing), which is called all-points- Interpolation. This AP is the area under the PR curve.

Specific examples:

Coco mAP calculation method

The AP is calculated by IOU (used to decide whether TP is used) 10 times in [0.5:0.05:0.95] and then the mean value is calculated.

Non-maximum suppression

NMS algorithm is generally designed to remove redundant boxes after model prediction, and it is generally set with nMS_threshold =0.5. Specific implementation ideas are as follows:

1. Select the box with the maximum scores, call it box_best, and keep it

2. Calculate the IOU for box_best and the rest of the boxes

3. If its IOU>0.5, discard the box (keep the one with the highest score since the two boxes may represent the same goal)

4. Find the maximum scores from the last remaining boxes, and repeat

Click to follow, the first time to learn about Huawei cloud fresh technology ~