Welcome to pay attention to the collection or share, reprint please indicate the source. The individual column
At the same time, the realization of face reconstruction and related applications need in-depth understanding of optimization algorithms, engineering is not small, is a good entry point to learn CV.
Face reconstruction here refers to the reconstruction of the three-dimensional model of the face through the two-dimensional face image. There are probably the following methods: reconstruction through multi-view geometry, which needs to collect faces from different angles and has relatively high requirements on equipment; Using RGBD or RGB camera, 3D Morphable Model (3DMM) method was used for reconstruction. This method also has limitations. Due to the characteristics of the model, it cannot generate model details (wrinkles, etc.). In recent years, deep learning methods have been used for reconstruction (some are combined with traditional 3DMM methods to train face model parameters).
This paper mainly talks about the 3DMM method of monocular RGB camera. This method has low requirements on equipment, simple algorithm and is easy to realize real-time reconstruction on mobile terminal. The premise here is that we already have the key points of image face detection. Input the image and the detected key points of the face, output the face three-dimensional mesh.
An overview of the
The 3DMM method was first proposed by Blanz[2] in 1999, and subsequent improvements are usually based on their work. This approach has a priori knowledge of face models and can deform them. So no matter what the Angle of the face, can get a more complete face.
At present, relatively common Face models include Basel Face Model (BFM), Surrey Face Model (SFM), FaceWarehouse, Large Scale Facial Model (LSFM), etc. Among them, LSFM is probably the most accurate model today. BFM is relatively easy to get, and many people are experimenting with it. SFM’s open source framework also contributes greatly to the community. At the same time, FaceWarehouse of Zhejiang University team and a series of related articles are worth reading.
First, we need to understand the two concepts of parameterized face model and BlendShape model.
Parameterized face model
In Blanz’s method, they scanned 200 models of adult heads, each containing about 70,000 vertices. After PCA processing, the parametric face model is made, and each face model has the same topological structure, but the vertex position or color is different. You can think of individual feature vectors as different features of the face, such as the length of the face, the weight of the face and so on.
Here the face model is divided into two vectors:
Therefore, any new face can be generated by linear combination of these feature vectors:
Blendshape emoticons
Blendshape is a technique used in 3D software to deform models. By adjusting the weights, designers can transform a target model into a series of predefined models, or any linear combination of these models.
In the digital production industry, blendShape is commonly used to create facial expressions, which combine a set of basic facial expressions into a new one. Again, these models have the same topology, changing only the vertex positions. Limit the weights so as not to break and distort the expression.
Delta BlendShape, the difference between each expression and the normal one, is often used when reconstructing a face.
Combining BlendShape with parametric face model gives:
S represents a face model with a certain facial feature and an expression.
The reconstruction
“Estimating” the face model from a given image is a bit like the reverse of rendering. Therefore, in addition to the face model, camera parameters have to be considered.
Weak perspective projection is used here.
Weak perspective projection uses the same principle as orthogonal projection, but is multiplied by a scaling parameter to achieve the effect of near larger and far smaller.
It can be regarded as a mixture of perspective projection and orthogonal projection. [4]
The key of reconstruction algorithm is to find the appropriate parametersTo make the projection of 3d face model on the plane as close as possible to the original image. If texture is not considered, it can be simplified as ** “make the projection of the key points on the plane of the face model and the position of the key points on the 2D face as close as possible” **.
It should be noted that the common key points of face detection results are generally two-dimensional points. In other words, the edge of the face is detected differently depending on the orientation of the face, and the edge of the 2D face is not on the edge of the real cheek.
See below for details. The blue dots in the top row are two-dimensional face detection points (2D). The blue dots in the bottom row of the image are key points of the model (3D), and the red dots are corresponding 2D points. It is clear that the 2D edge points detected by the image are different from the 3D edge points of the model itself. Therefore, in the process of iteration, the 3D edge index of the model needs to be constantly updated to make the two correspond one by one.
There are many ways to find the edge points of a 3D model in the current view.
The convex hull algorithm can be used to find the edge line, but also according to the Angle between the normal line and the line of sight, find the tangent line of the face. Or refer to the description in [5] to find the outermost point on a set of pre-preserved facial lines.
From the above description, we have turned the reconstruction problem into an optimization problem.
Among them, S is the face model, S is the scale, R is the rotation matrix, T is the displacement vector, Y is the face detection point, n is the number of face key points.andThey are the key points of face image and the index of face 3D model points.
This is a nonlinear least squares problem because of the projection variation. You can use gauss-Newton, Levenberg-Marquardt, and so on, but I won’t go into that.
There’s another easy way to do it.
First, the camera parameters of the model were estimated using 2d and 3D point sets, and the above equation was substituted into a linear problem. Then face parameters and expression parameters are calculated by stages. The whole iteration process is to fix some parameters and update other parameters.
Face shape feature parameters and expression parameters need to be controlled within a certain range (depending on the model), otherwise there may be unreasonable model shape. In the video scene, the stability between frames and the consistency of face features should also be considered.
application
After the reconstruction, we can obtain the 3d face mesh, the position of the model in the image, and the blendShape expression parameters of the current face. A variety of interesting effects can be achieved based on the above information. Here are some examples of these parameters.
Three dimensional grid and spatial position
With the 3d model and location information, we can render with the face model obscured to create a 3D sticker effect, such as wearing a headdress, glasses, etc.
Models and textures
Draw the model UV map, and then render the face model. This method can be used to add facial hair, paint, masks, etc.
Face model parameters and expression parameters
You can transfer the calculated emotion-weight weights to a BlendShape model with the same Settings and drive the animation with human faces to achieve animoji-like effects. You can also change the original facial expression parameters to make the photo move.
Write about it here, other methods or details will be added at a later opportunity.
reference
[0] Apple just unveiled ‘Animoji’ — emojis that talk and sync to your face [1] Facebook buys Popular Face Considerations APP for silly selfies [2] A Morphable Model For The Synthesis Of 3D Faces [3] Face Transfer with Multilinear Models [4] 3D projection [5] High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild [6] iPhone X Facial The Capture – Apple blendshapes