• Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This paper introduces various basic concepts related to object detection in machine learning.

Target detection

Target detection, also called target extraction, is an image segmentation based on the geometric and statistical characteristics of the target. It combines the segmentation and recognition of the target, and its accuracy and real-time is an important capability of the whole system.

Object detection is a popular direction of computer vision and digital image processing. It is widely used in robot navigation, intelligent video surveillance, industrial inspection, aerospace and many other fields. It has important practical significance to reduce the consumption of human capital through computer vision. Therefore, target detection has become a research focus in theory and application in recent years. It is an important branch of image processing and computer vision, and the core part of intelligent monitoring system. At the same time, target detection is also a basic algorithm in the field of pan-identity recognition. It plays an important role in face recognition, gait recognition, crowd counting, instance segmentation and other tasks. Due to the wide application of deep learning, target detection algorithm has been developed rapidly.

Ground Truth

Ground Truth(GT) represents the real label assigned to the data by manual means for training the model, verifying and testing the performance of the model.

Bounding Box (bbox)

Object detection needs to locate the target position and category in the image, which is reflected in the data in the form of enclosing the target area with rectangular boxes and marking category labels, which is the Bounding Box of target detection. This form is used for both labeling and algorithm output, so as to facilitate machine learning and comparison of experimental results.

ROI (region of interest)

Regions of interest, similar to the bBox concept, define the parts of an image we are interested in and hand them over to the machine to learn.

IoU (Intersection over Union)

Intersection union ratio (intersection ratio) : in the existing database, we will use some method to annotate the data and assign the target region and category in the data, that is, to annotate ROI. After learning the data, the detection algorithm will obtain a detection model, which acts on a certain image to output the detection result Bbox.

IoU is used to evaluate the output results of the detection model, and the calculation method is the ratio of the intersection area of output Bbox and labeled Bbox to the union area. The higher the ratio, the more accurate the result, and vice versa.

confidence

Confidence indicates the confidence degree of the detection model to the target detected by itself. The larger the value is, the more confident the model is in the accuracy of its own detection results.

Detection of judgement

In the classification task, the output result of the classification model is regarded as its classification category, and the correctness of the prediction can be determined by comparing the output with the real category label. Accordingly, the prediction can be delimited into TP, FP, TN and FN.

The output result of the target detection task is different from the classification, which contains the information of a certain position in the image belonging to a certain category and may contain multiple targets. In fact, it is unrealistic for the output result to be completely consistent with the labeled label, so how to determine whether the detection result is correct?

It is necessary to set the IoU threshold in advance to determine the correct result in target detection, and then determine the detection frame one by one:

  • Walk through each category
  • Rank the prediction boxes in descending order of confidence
  • For each predicted bbox, find the GT_bbox with the largest IoU
  • If the GT_bbox has not been previously allocated and the IoU is greater than a given threshold (such as 0.5), the Gt_bbox is allocated to the prediction bbox and set to TP; Otherwise, set the prediction bbox to FP
  • Gt_bbox that is not correctly predicted is set to FN
  • There is no explicit TN in the detection results

Performance evaluation

After determining the results, we can use the classified evaluation system to calculate the performance indicators:

Machine learning – Basics – Precision, Recall, Sensitivity, Specificity, Accuracy, FNR, FPR, TPR, TNR, F1 Score, Balanced F Score

ROC and PR curves can also be drawn:

Machine learning – Basics – PR, ROC Curves and AUC