Today, we use facial markers and OpenCV to detect the number of blinks in a video stream.
To construct our blink detector, we will calculate a metric called THE eye aspect ratio (EAR), which is presented by Soukupova and č e Ch in their 2016 paper real-time blink Detection using Facial Markers.
Unlike traditional image processing methods, which calculate blink, traditional image processing methods usually involve some combination of the following:
- Eye localization.
- Threshold to find the whites of the eyes.
- Determine if the “white” area of the eye disappears over a period of time (indicating blinking).
- Eye aspect ratio is a more elegant solution that involves a very simple calculation based on the distance ratio between eye facial markers.
This method is fast, efficient and easy to implement.
Today we realize blink detection through four parts:
In the first part, we’ll discuss the eye aspect ratio and how to use it to determine whether a person is blinking in a given video frame.
Then, we’ll write Python, OpenCV, and dlib code to (1) perform face mark detection and (2) detect blinking in the video stream.
Based on this implementation, we will apply our method to detect blinking in sample webcam streams and video files.
Finally, I’ll close today’s blog post by discussing ways to improve the blink detector.
Know the eye aspect ratio (EAR)
In the blink detection, we’re only interested in two sets of facial structures — the eyes. Each eye is represented by six (x, y) coordinates, starting at the left corner of the eye (as if you were looking at a person) and rotating clockwise around the rest of the area:
Based on this picture, we should understand the key points:
There is a relationship between the width and height of these coordinates. Based on the work of Soukupova and č e Ch in their 2016 paper real-time Blink Detection using Facial Markers, we can derive an equation that reflects this relationship, called the eye aspect ratio (EAR) :
The p1,… , p6 is the position of 2D facial marker.
The numerator of the equation calculates the distance between vertical eye markers, while the denominator calculates the distance between horizontal eye markers. Since there is only one set of horizontal points but two sets of vertical points, the denominator is appropriately weighted.
Why is this equation so interesting?
Well, as we shall discover, the aspect ratio of the eye is roughly constant when the eye is open, but drops rapidly to zero when the eye blinks.
Using this simple equation, we can avoid using image processing techniques and rely only on the ratio of the distance between the eye markers to determine whether a person is blinking.
To illustrate this point more clearly, consider this illustration from Soukupova and č ech:
In the upper left corner, we have a fully open eye — here the aspect ratio will be large (R) and relatively constant over time.
However, as soon as a person blinks (top right), the aspect ratio of the eyes drops sharply to near zero.
The figure below charts the eye aspect ratio of a video clip over time. As we can see, the aspect ratio of the eyes is constant, drops rapidly to near zero, and then increases again, indicating that a blink has occurred.
In the next section, we’ll learn how to implement eye aspect ratio for blink detection using face markers, OpenCV, Python, and Dlib.
Blink detection using facial markers and OpenCV
First, open a new file and name it detect_blinks.py. From there, insert the following code:
# import the necessary packages
from scipy.spatial import distance as dist
from imutils.video import FileVideoStream
from imutils.video import VideoStream
from imutils import face_utils
import numpy as np
import argparse
import imutils
import time
import dlib
import cv2
Copy the code
Import the necessary libraries.
If imUtils is not installed on your system (or if you are using an older version), be sure to install/upgrade using the following command:
pip install --upgrade imutils
Copy the code
If you don’t have dlib installed, please refer to my article:
Wanghao.blog.csdn.net/article/det…
Next, we’ll define our eye_aspect_ratio function:
def eye_aspect_ratio(eye): # compute the euclidean distances between the two sets of # vertical eye landmarks (x, y)-coordinates A = dist.euclidean(eye[1], eye[5]) B = dist.euclidean(eye[2], eye[4]) # compute the euclidean distance between the horizontal # eye landmark (x, y)-coordinates C = dist.euclidean(eye[0], Eye [3]) # compute the eye aspect ratio ear = (A + B)/(2.0 * C) # return the eye aspect ratio return earCopy the code
This function takes a single required parameter, the (x, y) coordinates of the facial marker for the given eye.
Calculate the distance between two sets of vertical eye markers, and then calculate the distance between horizontal eye markers.
Finally, the numerator and denominator were combined to get the final aspect ratio.
The eye aspect ratio is then returned to the calling function.
Let’s continue parsing our command-line arguments:
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--shape-predictor", required=True,
help="path to facial landmark predictor")
ap.add_argument("-v", "--video", type=str, default="",
help="path to input video file")
args = vars(ap.parse_args())
Copy the code
Our detect_blinks.py script requires a command-line argument, followed by a second optional argument:
-
Shape-predictor: This is the path to dLIB’s pre-trained facial marker detector. You can use the Download section at the bottom of this post to download the detector along with the source code + sample video into this tutorial.
-
–video: This optional switch controls the path of input video files that reside on disk. If you want to use live video streaming, simply omit this switch when executing the script.
We now need to set two important constants, and you may need to adjust for your own implementation while initializing two other important variables, so be sure to pay attention to this explanation:
# define two constants, One for the eye aspect ratio to indicate # flicker then the second constant for the number of consecutive # frames the eye must fall below the threshold EYE_AR_THRESH = 0.3 EYE_AR_CONSEC_FRAMES = 3 # Initialize the frame COUNTER and the TOTAL number of flickers COUNTER = 0 TOTAL = 0Copy the code
To determine whether blinking occurs in a video stream, we need to calculate the aspect ratio of the eyes.
If the aspect ratio is below a certain threshold and then above it, we register a “wink” — EYE_AR_THRESH is this threshold. We default it to 0.3 because it works best for my application, but you may need to adjust it for your own application.
Then we have an important constant, EYE_AR_CONSEC_FRAME — this value is set to 3 to indicate that three consecutive frames with an eye aspect ratio less than EYE_AR_THRESH must occur in order to register a blink.
Again, depending on the pipe’s frame processing throughput, you may need to increase or decrease this number for your own implementation.
Lines 44 and 45 initialize the two counters. COUNTER is the TOTAL number of consecutive frames with an eye aspect ratio less than EYE_AR_THRESH, while TOTAL is the TOTAL number of blinks that occurred while the script was running.
Now that our imports, command-line arguments, and constants are handled, we can initialize dlib’s face detector and face tag detector:
Print ("[INFO] loading facial landmark Predictor... ) detector = dlib.get_frontal_face_detector() predictor = dlib.shape_predictor(args["shape_predictor"])Copy the code
Initialize the actual facial marker predictor.
Dlib-generated facial symbols follow an indexable list as follows:
Therefore, we can determine the start and end array slice index values to extract (x, y) coordinates for the following left and right eyes:
# Get index of left and face markers # right eye, (lStart, lEnd) = face_utils.facial_landmarks_idxs ["left_eye"] (rStart, lEnd) = face_utils.facial_landmarks_idxs ["left_eye"] (rStart, lEnd) rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]Copy the code
Using these indexes, we will be able to extract the eye region effortlessly.
Next, we need to decide whether to use file-based video streaming or real-time USB/ Webcam /Raspberry Pi camera video streaming:
# start the video stream thread print("[INFO] starting video stream thread..." ) vs = FileVideoStream(args["video"]).start() fileStream = True # vs = VideoStream(src=0).start() # vs = VideoStream(usePiCamera=True).start() # fileStream = False time.sleep(1.0) FPS = 30 VideoWriter = cv2. videoWriter ('3.mp4',-1, FPS,sizeCopy the code
If you are using file video streaming, leave the code as is.
If you want to use a built-in webcam or USB camera, uncomment # vs = VideoStream(SRC =0).start().
For Raspberry Pi camera modules, uncomment # vs = VideoStream(usePiCamera=True).start().
Define frames.
Define the size
Define a video write object
Finally, we reach the main loop of the script:
# loop over frames from the video stream while True: If fileStream and not vs.more(): break frame = vs.read() if frame is None: break frame = imutils.resize(frame, width=450) gray = cv2.cvtColor(frame, Cv2.color_bgr2gray) # Rects = Detector (Gray, 0)Copy the code
Traverse the frames in the video stream.
If we are accessing a video file stream and there are no more frames in the video, we break the loop.
Read the next frame from the video stream, then resize it and convert it to grayscale.
Then we detect faces in gray frames through dLIB’s built-in face detector.
We now need to traverse each face in the frame and apply face marker detection to each person:
# loop over the face detections for rect in rects: # Identify the facial markers of the facial area, Shape = Predictor (gray, RECt) shape = face_utils.shape_to_NP (shape) Then use # coordinates to calculate the aspect ratio of both eyes leftEye = Shape [lStart:lEnd] rightEye = Shape [rStart:rEnd] leftEAR = eye_aspect_ratio(leftEye) RightEAR = eye_aspect_ratio(rightEye) # Average the rightEye between two eyes ear = (leftEAR + rightEAR) / 2.0Copy the code
Identify the face markers for the face area and convert these (x, y) coordinates to a NumPy array.
Using the array slicing technique earlier in this script, we can extract the (x, y) coordinates of the left and right eyes respectively.
Then, calculate the aspect ratio for each eye in lines 96 and 97.
As suggested by Soukupova and č e Ch, we averaged the vertical and horizontal ratios of the two eyes together to get a better estimate of blink (assuming, of course, that a person blinks simultaneously).
Our next block of code just deals with the visualization of facial markers in the eye region itself:
# Calculate the convex hull of the right and left eye, Then # visualize each eye leftEyeHull = cv2.convexHull(leftEye) rightEyeHull = cv2.convexhull (rightEye) cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1) cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)Copy the code
At this point, we have calculated our (average) eye aspect ratio, but we have not actually determined whether blinking occurred — this will be addressed in the next section:
If EAR < EYE_AR_THRESH: COUNTER += 1 # Otherwise, the eye aspect ratio is not lower than blink # critical point else: # If the eyes are closed enough times # then increase the TOTAL number of flashes if COUNTER >= eye_ar_frames: TOTAL += 1 # Reset the eyeframe COUNTER COUNTER = 0Copy the code
Check if the aspect ratio of the eyes is below our blinking threshold — if increasing the number of consecutive frames indicating that a blink is occurring.
Otherwise, deal with the case where the aspect ratio of the eyes is at least below the blink threshold.
In this case, check again to see if a sufficient number of consecutive frames contain blink rates below our predefined threshold.
If the check passes, we increase the total number of flashes.
Then we reset the number of consecutive flashes COUNTER.
Our final code block simply deals with drawing the blink count on our output frame and displaying the current eye aspect ratio:
PutText (frame, "Blinks: {}". Format (TOTAL), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2) cv2.putText(frame, "EAR: {:.2f}". Format (ear), (300, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2) # show the frame cv2.imshow(" frame ", frame) videoWriter.write(frame) key = cv2.waitKey(1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break videoWriter.release() # do a bit of cleanup cv2.destroyAllWindows() vs.stop()Copy the code
Blink detection results
To apply our blink detector to the sample video, simply execute the following command:
python detect_blinks.py --shape-predictor shape_predictor_68_face_landmarks.dat --video 11.mp4
Copy the code
Test results:
Test video link:
).start() #fileStream = True vs = VideoStream(src=0).start() # vs = VideoStream(usePiCamera=True).start() fileStream = False Copy the code
Comment FileVideoStream and uncomment VideoStream.
Execute command:
python detect_blinks.py --shape-predictor shape_predictor_68_face_landmarks.dat
Copy the code
conclusion
In this blog post, I demonstrate how to build a blink detector using OpenCV, Python, and Dlib.
The first step in building a blink detector is to perform face marker detection to locate the eye in a given frame in the video stream.
Once we had facial markers for both eyes, we calculated the aspect ratio for each eye, which gave us a singular value that correlated the distance between vertical eye markers with the distance between horizontal markers.
Once we have the aspect ratio, we can determine if a person is blinking – the aspect ratio will remain roughly constant at eye opening, then rapidly approach zero at eye opening, and then increase again as the eyes open.
To improve our blink detector, Soukupova and č e Ch proposed to construct a 13-dimensional aspect ratio feature vector (frame N, frame N — 6, and FRAME N + 6) and then input this feature vector into linear SVM classification.
You also learned how to save videos from this blog post.
One application scenario for blink detection is drowsiness detection.
The complete code is as follows: download.csdn.net/download/hh…