Can you tell if this photo was taken in person or on the phone screen?
Would you be surprised to learn that this photo was actually taken against a phone screen photo?
But even with a photo like this, it’s easy to tell if it’s a real person or a remake. In finance, credit investigation, security and other scenes where face recognition is most widely used, face living recognition can be said to be the most critical link with large-scale application.
1 What is binocular living
Binocular means have two eyes. One eye is a visible light camera, which takes color pictures. The other eye is a near-infrared camera, black and white. And binocular liveness, actually also corresponds to two algorithms, monocular face liveness recognition and near infrared liveness recognition.
-
The principle of visible-light facial liveness recognition is to judge whether the target object is alive or not by using the flaws (mole lines, imaging malformations, etc.) of the portrait in the picture, which can effectively prevent cheating attacks such as the second retaking of the screen
-
The principle of near-infrared living recognition is that the infrared camera emits infrared light to illuminate the surface of the object, and the imaging element (CCD or CMOS) is used to feel the infrared light reflected back from the surrounding environment. Because the reflectivity of different materials is different and other reasons, the algorithm can be analyzed to identify whether the current user is a real person.
Due to its own infrared light source, it is less affected by ambient light, and can take images in a completely dark environment. It has nearly 100% anti-attack ability for the mobile phone screen, as shown below!
Key indicators of facial liveness recognition include:
-
Recognition speed: the time of the algorithm from receiving the image to the output result
-
Pass rate: a threshold is set. Given N samples of real people, M samples outputed by the algorithm whose score is higher than the threshold (that is, the samples are correctly judged as real people), then the pass rate =M/N
-
Rejection rate: a threshold is set. Given N attack samples, M samples outputted by the algorithm are lower than the threshold (that is, they are correctly judged to be dummies), then the rejection rate =M/N
The pass rate and rejection rate are tested out under the same threshold value. The selected threshold value is high, the pass rate is low, and the rejection rate is high. The lower the threshold, the higher the pass rate, but the higher the rejection rate.
Why is binocular face in vivo recognition so important
-
Offline scene, highly compatible
Yunzhike divides the main application scenarios of face recognition into two categories: online remote authentication scenarios (financial account opening, face registration, face login, etc.) and offline unattended scenarios (face access control, face withdrawal, face payment, etc.).
Face in vivo recognition technology, there are two main categories: hardware dependence is relatively low, such as living, silent living; There are certain requirements for hardware, which need to be compatible with hardware, such as binocular living body, 3D structured light living body, etc. Although the cost of the latter is higher than the former, but the anti-attack effect is better. In offline scenes, hardware is naturally needed, so the latter also becomes the best choice for offline scenes.
-
The technology is mature and widely used
In terms of cost, the technical difficulty of binocular camera is lower than that of 3D structured light, so the cost is lower. On the market, the composition of binocular camera module is about ¥300, while 3D structured light module is ¥500-800.
In terms of industrial structure, there are only two companies in China that can mass-produce 3D structured light hardware. And can do binocular living hardware, countless. Therefore, the mainstream of the offline scene is still binocular living, with higher popularity and more mature industry development.
-
Black industry game, still reliable
Where there is light, there is darkness. Now the black industry, not only know how to use telecom fraud and other means to fraud, they also know how to use AI and technology, and form an industrialization, to provide upstream and downstream use.
For example, at present, many Of Hujin apps adopt the way of living face recognition with live actions (that is, only when random actions occur in the system and users make specified actions in real time can they be considered as real people). For this way, The black company uses 3D modeling technology to generate any specified actions only with a photo. And binocular living body, still is one of the most reliable attack prevention means at present.
3. Explain the binocular living technology of Cloud Identification
3.1 Core Ideas
Although there are a variety of binocular face liveness recognition algorithms in the industry, most of them use large neural networks in order to obtain a high accuracy liveness face recognition model. Although the performance of the model is improved, the model is too large and time-consuming. How to give consideration to recognition accuracy and recognition speed is always a big problem in the industry.
By collecting multiple face regions, The difference between living and non-living data can be captured more effectively, and more effective discriminant information can be introduced into the convolutional neural network to improve the recognition accuracy. At the same time, multiple faces share the convolutional neural network, which effectively reduces the computational complexity of the model compared with one neural network corresponding to other faces.
3.2 Implementation Roadmap
Let’s take a look at the overall implementation logic:
From this flow chart, it can be clearly seen that there are two key steps on the whole: adaptive face extraction and convolutional neural network.
-
Adaptive face deduction
The difference between living and non-living data can be captured more effectively, and more effective discrimination information can be collected to help improve the accuracy of recognition.
As shown above, we, in turn, take different faces of images into the same network (CNN), added more living judgment information effectively, the blue box main capture Moore lines, reflective, face the information such as distortion, the red areas to capture significant attack information, like mobile phones, paper border green area is the transition region, capture the relevant background information.
The overall calculation formula is as follows:
-
Convolutional neural network
In the adaptive face deduction process, we intercept the image data of three different regions and share CNN, which can also reduce the complexity of the model, and then fuse the different features of the output of the three images to jointly determine the final classification result.
Meanwhile, a novel loss function for monocular/living body network training is designed to increase the generalization performance of the network.
-
Samples collected
Finally, in addition to the dimension of the algorithm, Yunzhike also collected massive attack sample data, including mobile phone, screen, paper, mask and other attack scene samples, covering a wide range of coverage, to further enhance the anti-attack effect of living objects.
-
Optical flow assisted monocular in vivo judgment
Finally, for monocular vision-assisted vision-assisted vision-assisted vision-assisted vision-assisted vision-assisted verifications. The so-called optical flow field is the projection of the motion of objects in the three-dimensional real world on the two-dimensional image plane.
As shown in figure: The optical flow field data obtained through inter-frame information is quite different from the actual optical flow field in the living face area for paper attack and mobile phone attack. The optical flow field of living face region has direction inconsistency and is separated from the background. The optical flow field of paper and mobile phone attack has direction consistency and the face region is not separated from the background. Most of the non-living attack data under the moving state can be filtered by such laws.
3.3 Test key indicators
-
Recognition speed
The overall algorithm model size is only 5M, even on low performance devices, can run smoothly. On the 3288 CPU, it takes only 120ms, 200ms faster than its competitors.
-
Acceptance rate and rejection rate
The approval rate was 99.5% and the rejection rate was 99.9%
The four goes at the end
The key to algorithms is strategy. Algorithms are not castles in the air, but also need to be combined with the actual application scenarios and requirements, and deeply combined, to play the greatest value.