AI face recognition is a technique for recognizing or verifying faces from digital images or video frames. Humans recognize faces quickly and effortlessly. This is an easy task for us, but a difficult one for a computer. Because of the complexity, such as low resolution, occlusion, lighting variations, etc. These factors highly affect the accuracy with which computers can recognize faces more effectively. First to understand the difference between face detection and face recognition.

** Face detection: ** Face detection is generally thought of as finding faces (position and size) in images and possibly extracting them for use by face detection algorithms.

** Face recognition: ** Face recognition algorithms are used to find unique features described in the image. The face image has been extracted, cropped, resized, and usually transformed in grayscale.

There are many algorithms for face detection and face recognition. Here, we will learn to use HAAR cascade algorithm for face detection.

Basic concepts of HAAR cascade algorithm

HAAR cascade is a machine learning approach in which the cascade function is trained from a large number of positive and negative images. Positive images are images made up of human faces, and negative images are images without human faces. In face detection, image features are regarded as digital information extracted from pictures, which can distinguish one image from another.

We apply each feature of the algorithm to all training images. Each image is given equal weight at the beginning. It found the optimal threshold for classifying faces as positive and negative. Errors and misclassification are possible. We select features with the lowest error rate, meaning that these features are the best for classifying faces and non-face images.

All possible sizes and positions for each kernel are used to compute a large number of features.

HAAR cascade detection in OpenCV

OpenCV provides trainers and detectors. Cascade image classifier has two main states, one is training, the other is detection.

OpenCV provides two applications to train the cascade classifier opencV_HaarTraining and Opencv_trainCASCADE. The two applications store classifiers in different file formats.

For training, we need a sample. There are two types of samples:

  • ** Negative sample: ** is related to the non-target image.
  • ** Positive sample: ** Image related to the detected object.

A set of negative samples must be prepared manually, while the set of positive samples is created using the OpencV_createsamples utility.

Negative samples

The negative sample is taken from any image. Negative samples are added to the text file. Each line of the file contains the image filename of a negative sample (relative to the directory describing the file). This file must be created manually. The images defined may have different sizes.

Is the sample

The positive sample is created by the OpencV_createsamples utility. These samples can be created from a single image with objects or from an earlier collection. It is important to remember that we need a large positive sample data set before providing it to the above utility, because it only applies perspective transformations.

Here we will discuss face detection. OpenCV already includes a variety of pre-trained classifiers for faces, eyes, smiles, and more. These XML files are stored in opencv/data/haarcascades/folder. Let’s understand the following steps:

  • Step 1

First, we need to load the necessary XML classifiers and load the input image (or video) in grayscale mode.

  • Step 2

After converting the image to grayscale, we can do image processing, resizing, cropping, blurring and sharpening if necessary. The next step is image segmentation; Multiple objects in a single image are recognized, so the classifier can quickly detect objects and faces in the image.

  • Step 3

Haar-like feature algorithm is used to find the position of a face in a frame or image. All faces share some common features, such as the eye area being darker than its neighbors and the nose area being brighter than the eyes area.

  • Step 4

In this step, we extract features from the image by means of edge detection, line detection and center detection. Then provide x, Y, W, H coordinates, in the picture to form a rectangular box to represent the position of the face. It creates a rectangular box in the area needed to detect a face.

Use OpenCV for face recognition

Face recognition is a simple task for humans. Successful face recognition is often the effective recognition of internal features (eyes, nose, mouth) or external features (head, face, hairline). The question here is: How does the human brain encode it?

David Hubel and Torsten Wiesel have shown that our brains have specialized nerve cells that respond to unique local features of a scene, such as lines, edge angles or movement. Our brains combine different sources of information into useful patterns; We don’t think of vision as discrete. If we define face recognition in simple terms, “Automatic face recognition is taking those meaningful features from an image and putting them into useful representations, and then doing some classification of them”.

The basic idea of face recognition is based on geometric features of face. This is the most feasible and intuitive method for face recognition. The first automatic face recognition system described the position of the eyes, ears and nose. These anchor points are called eigenvectors (distances between points).

Face recognition is realized by calculating the Euclidean distance between the probe and the feature vector of the reference image. This approach is essentially effective in light variation, but it has a considerable drawback. Registering correctly is very difficult.

Face recognition systems can basically operate in two modes:

  • Facial image authentication or verification –

It compares the input face image with the user’s associated face image, which requires authentication. This is a 1×1 comparison.

  • Identification or facial recognition

It basically compares the input face image from the data set to find the user that matches that input face. This is a 1xN comparison.

There are many types of face recognition algorithms, for example:

  • Eigenfaces
  • Histograms of Local Binary Patterns LBPH
  • Fisherfaces
  • Scale Invariant Feature Transform (SIFT)
  • Speed Up Robust Features (SURF)

Each algorithm follows a different method to extract image information and match it with the input image. Here we will discuss the local binary pattern histogram (LBPH) algorithm, which is one of the oldest and most popular algorithms.

LBPH introduction

Local binary pattern histogram algorithm is a simple method that marks the pixels of an image and performs threshold processing on the neighborhood of each pixel. In other words, LBPH summarizes the local structure in the image by comparing each pixel with its neighbor and converting the result into a binary number. It was first defined in 1994 (LBP) and since then it has been found to be a powerful texture classification algorithm.

This algorithm usually focuses on extracting local features from images. The basic idea is not to think of the whole image as a higher-dimensional vector; It only focuses on the local characteristics of the object.

In the figure above, a pixel is centered and a threshold is set for its neighbor. The center pixel is represented by 1 if its intensity is greater than or equal to that of its neighbor, and 0 otherwise.

Let’s look at the steps of the algorithm:

**LBPH accepts four parameters:

  • Radius: Indicates the radius around the center pixel. It is usually set to 1. It is used to build circular local binary patterns.
  • Neighbors: Sample points for building circular binary patterns.
  • Grid X: Number of horizontal cells. The more cells and the finer the grid, the higher the dimension of the eigenvectors obtained.
  • Grid Y: number of vertical cells. The more cells and the finer the grid, the higher the dimension of the eigenvectors obtained.

**2. Training algorithm: ** The first step is to train the algorithm. It requires a data set containing facial images of the people we want to identify. Each image should provide a unique ID (which could be a person’s number or name). The algorithm then uses this information to identify the input image and provide you with the output. Images of specific people must have the same ID. Let’s take a look at the next LBPH calculation.

**3. Operation with LBP: ** In this step, LBP calculations are used to create intermediate images that describe the original image in a specific way by highlighting facial features. Parameter radius and neighbor used for sliding window concept.

For a more concrete understanding, let’s break it down into small steps:

  • Assume that the input face image is grayscale.
  • We can get a portion of this image as a 3×3 pixel window.
  • We can use a 3×3 matrix that contains the intensity of each pixel (0-255).
  • Then, we need to take the center value of the matrix as the threshold.
  • This value will be used to define new values from the eight neighbors.
  • For each neighbor of the center value (threshold), we set a new binary value. Value 1 is set to a value equal to or higher than the threshold, and 0 is set to a value lower than the threshold.
  • The matrix will now contain only binary values (skipping center values). We need to process each binary value for each position in the matrix into the new binary value line by line (10001101). There are other ways to concatenate binary values (clockwise), but the end result will be the same.
  • We convert this binary value to a decimal value and set it to the center value of the matrix, which is a pixel in the original image.
  • After completing the LBP process, we have the new image, which represents better features of the original image.

** The image is generated in the last step. We can divide the image into multiple grids using Grid X and Grid Y parameters. Let’s consider the following image:

  • We have a grayscale image; Each histogram (from each grid) will contain only 256 positions, representing the presence of each pixel intensity.
  • You need to create a new, larger histogram by concatenating each histogram.

**5. Perform face recognition: ** Now, the algorithm is well trained. The extracted histogram is used to represent each image in the training dataset. For the new image, we repeat the steps and create a new histogram. To find an image that matches a given image, we just need to match the two histograms and return the image with the closest histogram.

  • There are many ways to compare histograms (calculating the distance between two histograms), such as Euclidean distance, Chi-square, absolute value, etc. We can use Euclidean distance based on the following formula:

  • The algorithm returns the ID as the output of the image with the closest approximation to the histogram. The algorithm should also return the calculated distance, which can be called a confidence measure. If the confidence is lower than the threshold value, it means that the algorithm has successfully recognized the face.

In this article, we have discussed the working principle of face detection and face recognition. Here we summarize:

  • Haar cascade algorithm is used for face detection.
  • There are many kinds of face recognition algorithms, among which LBPH is a simple and popular algorithm.
  • It usually focuses on local features in the image.

Develop reading

Our R&d staff in TSINGSEE Qingxi Video are also actively developing AI technologies such as face detection, face recognition, traffic statistics and helmet detection, and actively integrating them into existing video platforms. Typical examples are EasyCVR video fusion cloud service, which is capable of AI face recognition, license plate recognition, voice intercom, pso control, sound and light alarm, surveillance video analysis and data summary. It is widely used in intelligent access control of residential areas and buildings, perimeter suspicious wandering detection, scenic spot traffic statistics and other scenes.