This is the 11th day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021

Face detection using OpenCV and deep learning

Today’s post is divided into three parts.

In part 1, we’ll discuss the origins of more accurate OpenCV face detectors and their place in the OpenCV library.

Then I’ll demonstrate how to perform face detection in an image using OpenCV and deep learning.

Finally, I’ll discuss how to apply face detection to video streams using OpenCV and deep learning.

Where are these “better” face detectors in OpenCV, and where do they come from?

Back in August 2017, OpenCV 3.3 was officially released, bringing with it a highly improved “Deep Neural Network” (DNN) module. The module supports a variety of deep learning frameworks, including Caffe, TensorFlow, and Torch/PyTorch. Aleksandr Rybnikov, the main contributor to the DNN module, put a lot of work into making this module possible (he deserves our thanks and applause).

Unbeknownst to most OpenCV users, however, Rybnikov includes a more accurate, deep learning-based face detector in the official version of OpenCV (though it can be a bit difficult if you don’t know where to look). Caffe based face detectors can be found in the FACE_Detector subdirectory of DNN samples:

Samples opencv/samples/ DNN/face_Detector at 4.x · Opencv/Opencv (github.com)

To use OpenCV’s deep neural network module with Caffe’s model, you need two sets of files:

  • A.prototxt file that defines the model schema (the layer itself)

  • Link to.caffemodel file with actual layer weights:

    Raw.githubusercontent.com/opencv/open…

These two files are required for deep learning using models trained with Caffe. However, you can only find prototxt files in the GitHub repository.

The weight files are not included in the OpenCVsamples directory, and it will take more digging to find them…

Face detection in images using OpenCV and deep learning

In the first example, we’ll learn how to apply OpenCV’s face detection to a single input image. In the next section, we’ll learn how to modify this code and apply OpenCV’s face detection to video, video streaming, and webcams. Open a new file, name it DETECt_faces.py, and insert the following code:

# import the necessary packages
import numpy as np
import argparse
import cv2
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i"."--image", required=True.help="path to input image")
ap.add_argument("-p"."--prototxt", required=True.help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m"."--model", required=True.help="path to Caffe pre-trained model")
ap.add_argument("-c"."--confidence".type=float, default=0.5.help="minimum probability to filter weak detections")
args = vars(ap.parse_args())
Copy the code

Import the required packages and parse the command line arguments. We have three required parameters:

–image: Enter the path of the image.

–prototxt: Caffe Prototxt file path.

— Model: Path to the pre-trained Caffe model.

Optional –confidence can override the default threshold of 0.5. From there let’s load our model and create a blob from our image:

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
# load the input image and construct an input blob for the image
# by resizing to a fixed 300x300 pixels and then normalizing it
image = cv2.imread(args["image"])
(h, w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300.300)), 1.0,
	(300.300), (104.0.177.0.123.0))
Copy the code

First, we load our model using the –prototxt and –model file paths. We store the model as NET (line 20).

Then we load the image, extract the dimensions, and create a BLOb. Dnn. blobFromImage takes care of the preprocessing, including setting the BLOB size and normalizing. If you’re interested in learning more about the dnN. blobFromImage function, I’ll cover it in detail in this blog post. Next, we will apply face detection:

# pass the blob through the network and obtain the detections and # predictions print("[INFO] computing object detections..." ) net.setInput(blob) detections = net.forward()Copy the code

To detect faces, bloBs are sent over a network. Then we will loop and draw a box around the detected face:

# loop over the detections
for i in range(0, detections.shape[2) :# extract the confidence (i.e., probability) associated with the
	# prediction
	confidence = detections[0.0, i, 2]
	# filter out weak detections by ensuring the `confidence` is
	# greater than the minimum confidence
	if confidence > args["confidence"] :# compute the (x, y)-coordinates of the bounding box for the
		# object
		box = detections[0.0, i, 3:7] * np.array([w, h, w, h])
		(startX, startY, endX, endY) = box.astype("int")
 
		# draw the bounding box of the face along with the associated
		# probability
		text = "{:.2f}%".format(confidence * 100)
		y = startY - 10 if startY - 10 > 10 else startY + 10
		cv2.rectangle(image, (startX, startY), (endX, endY),
			(0.0.255), 2)
		cv2.putText(image, text, (startX, y),
			cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0.0.255), 2)
# show the output image
cv2.imshow("Output", image)
cv2.waitKey(0)
Copy the code

Iterate over the test results.

We then extract the confidence and compare it to the confidence threshold. We perform this check to filter out weak detections. If the confidence meets the minimum threshold, we proceed to draw a rectangle and detect the probability.

To do this, we first compute the (x, y) coordinates of the bounding box. We then build the confidence text string that contains the detection probability. If our text deviates from the image (such as when face detection occurs at the very top of the image), we move it down by 10 pixels. Our face rectangle and confidence text are drawn on the image.

We then iterate through the other checks after this process. If not, we are ready to display our output image on the screen).

Open a terminal and execute the following command:

python detect_faces.py --image 2.jpg --prototxt deploy.proto.txt --model res10_300x300_ssd_iter_140000_fp16.caffemodel
Copy the code

Face detection in video and webcams using OpenCV and deep learning

Now that we’ve learned how to apply OpenCV’s face detection to individual images, let’s also apply face detection to videos, video streams, and webcams. Fortunately for us, much of the code we used in the previous section for face detection in a single image using OpenCV can be reused here!

Open a new file, name it DETECt_faces_video.py, and insert the following code:

# import the necessary packages from imutils.video import VideoStream import numpy as np import argparse import imutils import time import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") ap.add_argument("-c", "--confidence", type=float, Default =0.5, help="minimum probability to filter weak detections") args = vars(ap.parse_args())Copy the code

Compared to the above, we need to import three additional packages: VideoStream, ImUtils, and Time. If you don’t have imUtils in your virtual environment, you can install it by:

pip install imutils
Copy the code

Our command line arguments are basically the same, except this time we don’t have the –image path argument. We’ll switch to a webcam video source. From there we will load our model and initialize the video stream:

# load our serialized model from disk print("[INFO] loading model..." ) net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"]) # initialize the video stream and allow the camera sensor to warm up print("[INFO] starting video stream..." ) vs = VideoStream(SRC =0).start() time.sleep(2.0)Copy the code

The loading model is the same as above. We initialize a VideoStream object that specifies the camera with an index of zero as the source (usually this will be the built-in camera on your laptop or the first camera detected on your desktop). Here are some quick instructions:

If you want to use the Raspberry Pi camera module, Raspberry Pi + Picamera users can replace it with vs = VideoStream(usePiCamera=True).start(). If you want to parse a video file (rather than a VideoStream), replace the VideoStream class with FileVideoStream.

Then we let the camera sensor warm up for 2 seconds. From there we loop frames and calculate face detection using OpenCV:

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream and resize it
	# to have a maximum width of 400 pixels
	frame = vs.read()
	frame = imutils.resize(frame, width=400)
 
	# grab the frame dimensions and convert it to a blob
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300.300)), 1.0,
		(300.300), (104.0.177.0.123.0))
 
	# pass the blob through the network and obtain the detections and
	# predictions
	net.setInput(blob)
	detections = net.forward()
Copy the code

This block should look familiar with the still image version from the previous section. In this block, we read a frame from the video stream, create a BLOB, and pass the BLOB to the deep neural network for face detection.

We can now cycle detect, compare with confidence threshold, and draw face box + confidence value on screen:

# loop over the detections
	for i in range(0, detections.shape[2) :# extract the confidence (i.e., probability) associated with the
		# prediction
		confidence = detections[0.0, i, 2]
		# filter out weak detections by ensuring the `confidence` is
		# greater than the minimum confidence
		if confidence < args["confidence"] :continue
		# compute the (x, y)-coordinates of the bounding box for the
		# object
		box = detections[0.0, i, 3:7] * np.array([w, h, w, h])
		(startX, startY, endX, endY) = box.astype("int")
 
		# draw the bounding box of the face along with the associated
		# probability
		text = "{:.2f}%".format(confidence * 100)
		y = startY - 10 if startY - 10 > 10 else startY + 10
		cv2.rectangle(frame, (startX, startY), (endX, endY),
			(0.0.255), 2)
		cv2.putText(frame, text, (startX, y),
			cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0.0.255), 2)
Copy the code

To see this code block in detail, look at the previous section where we performed face detection on a still image. The code here is almost identical. Now that we have drawn OpenCV face detection, let’s display the frame on the screen and wait for the keystroke:

# show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF
 
	# if the `q` key was pressed, break from the loop
	if key == ord("q") :break
# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()
Copy the code

We display the frame on the screen until we press the “Q” key, at which point we break out of the loop and perform cleanup.

conclusion

In today’s blog post, you discovered a little-known secret about the OpenCV library — OpenCV provides a more accurate face detector out of the box (compared to OpenCV’s Haar cascade).

More accurate OpenCV face detectors are based on deep learning, specifically using the Single Lens Detector (SSD) framework and ResNet as the base network. Thanks to the hard work of Aleksandr Rybnikov and other contributors to OpenCV’s DNN module, we can enjoy these more accurate OpenCV face detectors in our own applications.

Hope you enjoyed today’s post. Finished weights, files, and code links:

A little known secret of the OpenCV library is that OpenCV provides a more accurate face detector out of the box (compared to OpenCV’s Haar cascade).

More accurate OpenCV face detectors are based on deep learning, specifically using the Single Lens Detector (SSD) framework and ResNet as the base network. Thanks to the hard work of Aleksandr Rybnikov and other contributors to OpenCV’s DNN module, we can enjoy these more accurate OpenCV face detectors in our own applications.

Hope you enjoyed today’s post. The weight of complete, documents and code links: download.csdn.net/download/hh…