OpenCV implements face detection based on residual network

OpenCV3.3 introduced the deep Neural network (DNN) module into the official release version for the first time. In the latest Version of OpenCV3.4, DNN module released two essential techniques, one supporting Faster R-CNN object detection, which has better detection accuracy and small object detection ability than SSD and YOLO models. The other is to support face detection based on SSD+Resnet model. Although the speed is not up to the real-time performance of HAAR cascade detector, the accuracy and model generalization ability can finish the face detection algorithm based on HAAR cascade detector. As OpenCV developers need face detection function and a more reliable choice, here we first briefly introduce what is residual network, and then give its face detection model in OpenCV based on the camera real-time face detection demonstration.

1. Residual Network (Resnet)

LeNet and AlexNet had few convolution layers in the original CNN network. VGG realized the increase of network depth through small convolution kernel and achieved significant results. However, when the number of layers increased excessively, training errors and test errors were found to increase, as shown below:

At the beginning, people thought it was caused by gradient disappearance or gradient explosion, but with the efforts of everyone, it was not an overfitting problem, but a network fading phenomenon. Therefore, MSRA Hemming team proposed a new network model -Residual Networks. The main idea is to use residual structure to train the network, a residual structure is as follows:

The author thinks that F(x) = H(x)-x, so H(x) = F(x) + X can be obtained. Then, the author establishes the 34-layer plain network and the 34-layer residual network as a comparison, and the vGG-19 network at the far left as a reference. The whole network structure is shown as follows:

— The picture is too big!!

After the model was established, the author conducted training and testing on different data sets, and observed that the effect of residual network was significantly better than that of 34-layer plain network, and the deeper the network layers were, the better the effect was. The 34-layer plain network had obvious fading phenomenon compared with the 18-layer plain network. The results of the comparative training are as follows:

Before the development of residual network, few networks have more than 100 layers, but residual network can reach thousands of layers. There is no doubt that He Keming’s team won the ImageNet image classification competition in 2015 with the residual network model, which used 152 layers of residual network. The residual network model of face detection in OpenCV is based on SSD, so the speed is very fast, and the effect is particularly good. Without further ado, I will look at how to use it in OpenCV to achieve face detection. 2: Face detection code implementation model is generated based on Caffe network training, so the first thing to do before writing the program is to download the model file and description file, which I have already downloaded, so you don’t have to ×××. Just go to my Github address and download the model file at github.com/gloomyfish1… Download the model and put it in a local folder. Then you can start programming. First we need to load the model into the network:

String modelDesc = "D:/vcprojects/images/dnn/face/deploy.prototxt"; String modelBinary = "D:/vcprojects/images/dnn/face/res10_300x300_ssd_iter_140000.caffemodel"; // Initialize the network DNN ::Net Net = readNetFromCaffe(modelDesc, modelBinary); if (net.empty()) { printf("could not load net... \n"); return -1; }Copy the code

To open the local camera or a video file, use the VideoCapture object as follows:

VideoCapture Capture (0); if (! capture.isOpened()) { printf("could not load camera... \n"); return -1; }Copy the code

After the camera is opened successfully, each frame can be read and written, and then converted to the data type acceptable by the network, the code is as follows:

Mat inputBlob = blobFromImage(frame, inScaleFactor, Size(inWidth, inHeight), meanVal, false, false); net.setInput(inputBlob, "data");Copy the code

Then in OpenCV, the detection is realized by calling net.forward. The confidence score (0~1) is extracted from the result, and the BOX position greater than the threshold value (assuming 0.5) is extracted, and the rectangle BOX can be drawn and displayed, the code of this part is as follows:

Forward ("detection_out"); vector<double> layersTimings; double freq = getTickFrequency() / 1000; double time = net.getPerfProfile(layersTimings) / freq; Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>()); ostringstream ss; ss << "FPS: " << 1000 / time << " ; time: " << time << " ms"; PutText (frame, ss. STR (), Point(20, 20), 0, 0.5, Scalar(0, 0, 255)); for (int i = 0; i < detectionMat.rows; i++) { float confidence = detectionMat.at<float>(i, 2); if (confidence > confidenceThreshold) { int xLeftBottom = static_cast<int>(detectionMat.at<float>(i, 3) * frame.cols); int yLeftBottom = static_cast<int>(detectionMat.at<float>(i, 4) * frame.rows); int xRightTop = static_cast<int>(detectionMat.at<float>(i, 5) * frame.cols); int yRightTop = static_cast<int>(detectionMat.at<float>(i, 6) * frame.rows); Rect object((int)xLeftBottom, (int)yLeftBottom, (int)(xRightTop - xLeftBottom), (int)(yRightTop - yLeftBottom)); rectangle(frame, object, Scalar(0, 255, 0)); ss.str(""); ss << confidence; String conf(ss.str()); String label = "Face: " + conf; int baseLine = 0; Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine); rectangle(frame, Rect(Point(xLeftBottom, yLeftBottom - labelSize.height), Size(labelSize.width, labelSize.height + baseLine)), Scalar(255, 255, 255), CV_FILLED); PutText (Frame, label, Point(xLeftBottom, yLeftBottom), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 0)); }}Copy the code

The final running result is as follows, and the face is not covered. Under normal circumstances:



With the face uncovered and the head tilted:



If the face is covered:



More tilt, side face, blur, etc. In various cases:



It can be seen how powerful the residual network model is. Here is the point “cool cool” sent to the HAAR cascade detector. The demo complete source code can be downloaded on GITHUB.

Github.com/gloomyfish1…

OpenCV DNN tutorial to learn