0. Article Overview
We know that face recognition is widely used in recent years, face attendance, face social, face payment, where there is the impact of black technology, especially in recent years machine learning popular, making face recognition in the application and accuracy is to reach a higher level.
Here’s how this dark technology works.
1, face recognition process
Face recognition is composed of a series of several related problems:
- Start by finding all the faces in an image.
- For each face, it could still recognize the same face, whether it was lit or facing away.
- To be able to find unique features in each face that others can distinguish, such as the size of the eyes, the length of the face and so on.
- Finally, the features of the face were compared to all known faces to determine who the person was.
Step 1: Identify all the faces
Obviously in the face recognition process we have to find the face in the image first. When we take pictures with mobile phones or cameras, the human image mode can easily detect the position of the face and help the camera to focus quickly.
Thanks to Paul Viola and Michael Jones, face detection on cameras became mainstream in 2000 when they developed a method that could run quickly on cheap cameras. However, we now have a more reliable solution called HOG (Histogram of Oriented Gradients), an algorithm that can detect object contours.
First we grayscale the image, because color information is useless for face detection.
We analyze each pixel and the surrounding pixels and draw an arrow based on its brightness. The arrow points to the direction in which the pixel is getting darker. If we repeat each pixel, the arrow will eventually replace the pixel. These arrows are called gradients, and they show the flow of an image from light to dark.
It’s a little uneconomical for us to analyze each pixel because it’s so detailed that we might get lost in a sea of pixels and we should look at the flow of light and shade from a higher Angle.
To do this we split the image into 16×16 pixel squares. In each square, calculate how many shavers are in each main direction (how many point up, up right, right, etc.). Then replace the square with the most directional arrow.
As a result, we converted the original image into a very simple HOG representation, which can easily capture the basic structure of the face.
To find the face in the HOG image, what we need to do is to look most similar to some of the known HOG patterns. These HOG patterns were extracted from other facial training data.
Step 2: Different facial postures
We’ve identified the faces in the picture, so how do we identify faces facing in different directions?
Facing a different face is a different thing for a computer, so we have to tweak the distorted image so that the eyes and mouth always overlap with the subject.
In order to achieve this goal, we will use a face Landmark estimation algorithm. There are many other algorithms that can do this, but we used a method invented by Vahid Kazemi and Josephine Sullivan in 2014.
The basic idea is to find points (called landmarks) that are common on 68 people’s faces.
- Chin contour 17 points [0-16]
- 5 spots on left eyebrow [17-21]
- 5 spots on right eyebrow [22-26]
- 4 points on bridge of nose [27-30]
- Five points on the tip of the nose [31-35]
- 6 points in left eye [36-41]
- 6 spots in right eye [42-47]
- 12 spots on outer lip [48-59]
- 8 spots on inner lip [60-67]
With these 68 points, it was easy to know where the eyes and mouth were, and then we rotated, zoomed and miscut the image so that the eyes and mouth were as close to the center as possible.
Now the faces are basically aligned, which makes the next step more accurate.
Step 3: Code your face
We still have a core problem, which is how to distinguish between different faces.
The easiest way to do this is to compare the unknown faces we found in step 2 with the faces we already know. When an unknown face looks similar to a previously tagged face, we can identify them as the same person.
We humans can easily distinguish between two faces based on eye size, hair color and so on, but how can a computer do that? Yeah, we have to quantify them, measure their differences, and how do we do that?
In fact, this information is easy to distinguish from a face, but to a computer, these values are not valuable. In fact, the most accurate way is to let the computer figure out what measurements it is collecting. Deep learning is better than humans at knowing which facial measurements are important.
So the solution was to train a deep convolutional neural network to generate 128 measurements of the face.
Each training session looked at three different images of faces:
- Load a training image of a known person’s face
- Load another photo of the same person
- Load another person’s photo
The algorithm then looks at its own measurements generated for the three images. Then, the neural network is tweaked slightly to ensure that the first and second measurements are close, while the second and third produce slightly different measurements.
We have to adjust the sample and repeat this process a million times, which is a huge challenge, but once the training is complete, it’s easy to find faces.
Thankfully, the big guys at OpenFace have done this, and they’ve released several networks that are trained to use directly, so we can deploy complex machine learning right out of the box, thanks to the open source spirit.
What the hell are these 128 measurements?
We don’t care. It doesn’t matter to us. What we care about is that when we see two different photos of the same person, our network needs to be able to get almost the same value.
Step 4: Find the name from the code
The last step is actually the easiest, and what we need to do is find the person in the database who most closely matches our test image measurements.
So how do we do that? Well, we use some off-the-shelf mathematical formulas to compute the Euclidean distance between two values of 128D.
So we get a Euclidean distance value, and the system will give it a threshold for what it considers to be the same person’s Euclidean distance, that is, beyond this threshold we will consider them to be the same (lost) person.
Face recognition is achieved, let’s review the process again:
- Use HOG to locate all the faces in the image.
- Calculate the 68 feature points of the face and adjust the position of the face appropriately, align the face.
- The facial image obtained in the previous step was put into the neural network to obtain 128 feature measurements and save them.
- Calculate the Euclidean distance together with the measured values we have saved before, get the Euclidean distance value, compare the value, and get whether the same person.
2, face recognition application scenarios
Face recognition is divided into two steps, face detection and face recognition, and their application scenarios are also different.
The purpose of face detection is to find the face and get the position of the face. We can use it in some scenes of beauty, skin change, image matting and face change.