In camera applications, real-time stickers and real-time face trimming are relatively common functions, which are based on face key point detection. This paper mainly introduces how to detect face key points in GPUImage.
preface
We’re going to somehow get the key points for each frame of the video, and then we’re going to use OpenGL ES to draw the key points on the screen. The final rendering looks like this:
There are two steps: key point acquisition and key point drawing.
1. Key point acquisition
Apple’s built-in SDK already includes some facial recognition. For example, in CoreImage and AVFoundation, interfaces are provided. However, they provide limited interface functions and do not have the function of face key detection.
We need to carry out real-time face key detection in the video, and also need to use the third-party library. Here are two main ways:
- Face++
- OpenCV + Stasm
1, Face++
1, the introduction of
The Face++ face keypoints SDK is charged, but it also offers a free trial version.
In the free trial version, the trial API Key can initiate 5 network authorization times per day, and each authorization duration is 24 hours. In other words, without deleting the APP, as long as there are no more than five test devices, you can continue to use the APP.
This is very friendly for developers, and the registration integration of Face++ is relatively easy, so I suggest everyone try it out.
2. How to integrate
Face key points SDK integration can refer to the official documents, first register and then download SDK compression package, compression package has detailed integration steps.
3. How to use it
The use of face keypoint SDK is mainly divided into three steps:
Step 1: Initiate network authorization
An authorized operation may not initiate a network request, but checks whether the local authorization information has expired.
@weakify(self);
[MGFaceLicenseHandle licenseForNetwokrFinish:^(bool License, NSDate *sdkDate) {
@strongify(self);
dispatch_async(dispatch_get_main_queue(), ^{
if (License) {
[[UIApplication sharedApplication].keyWindow makeToast:@"Face++ authorization successful!"];
[self setupFacepp];
} else{[[UIApplication sharedApplication].keyWindow makeToast:@"Face++ authorization failed!"]; }}); }];Copy the code
Step 2: Initialize the face detector
After the authorization succeeds, the initialization of the face detector starts. The initialization process loads the model data, then sets the recognition mode, video stream format, video rotation Angle, and so on.
NSString *modelPath = [[NSBundle mainBundle] pathForResource:KMGFACEMODELNAME
ofType:@ ""];
NSData *modelData = [NSData dataWithContentsOfFile:modelPath];
self.markManager = [[MGFacepp alloc] initWithModel:modelData
faceppSetting:^(MGFaceppConfig *config) {
config.detectionMode = MGFppDetectionModeTrackingRobust;
config.pixelFormatType = PixelFormatTypeNV21;
config.orientation = 90;
}];
Copy the code
Step 3: Detect video frames
After the face detector successfully initializes, it can detect each frame of the video stream, where the incoming data is CMSampleBufferRef type. Since the range of vertex coordinates is -1 ~ 1, coordinate transformation of the recognized results is also needed according to the current video size ratio.
- (float *)detectInFaceppWithSampleBuffer:(CMSampleBufferRef)sampleBuffer
facePointCount:(int *)facePointCount
isMirror:(BOOL)isMirror {
if (!self.markManager) {
return nil;
}
MGImageData *imageData = [[MGImageData alloc] initWithSampleBuffer:sampleBuffer];
[self.markManager beginDetectionFrame];
NSArray *faceArray = [self.markManager detectWithImageData:imageData];
// Number of faces
NSInteger faceCount = [faceArray count];
int singleFaceLen = 2 * kFaceppPointCount;
int len = singleFaceLen * (int)faceCount;
float *landmarks = (float *)malloc(len * sizeof(float));
for (MGFaceInfo *faceInfo in faceArray) {
NSInteger faceIndex = [faceArray indexOfObject:faceInfo];
[self.markManager GetGetLandmark:faceInfo isSmooth:YES pointsNumber:kFaceppPointCount];
[faceInfo.points enumerateObjectsUsingBlock:^(NSValue *value, NSUInteger idx, BOOL *stop) {
float x = (value.CGPointValue.y - self.sampleBufferLeftOffset) / self.videoSize.width;
x = (isMirror ? x : (1 - x)) * 2 - 1;
float y = (value.CGPointValue.x - self.sampleBufferTopOffset) / self.videoSize.height * 2 - 1;
landmarks[singleFaceLen * faceIndex + idx * 2] = x;
landmarks[singleFaceLen * faceIndex + idx * 2 + 1] = y;
}];
}
[self.markManager endDetectionFrame];
if (faceArray.count) {
*facePointCount = kFaceppPointCount * (int)faceCount;
return landmarks;
} else {
free(landmarks);
return nil; }}Copy the code
2. OpenCV + Stasm
1, the introduction of
OpenCV is an open source cross-platform computer vision library that implements many common algorithms in image processing. Stasm is an open source algorithm library for face feature detection that relies on OpenCV.
We know that iPhone screens refresh at 60 frames per second. In camera previews, the frame rate is generally limited to around 30 frames per second for power consumption, without causing significant lag.
Therefore, if we want to identify each frame of data, the recognition time of each frame is required to be less than 1/30 seconds, otherwise the rendering operation of image data will have to wait for the recognition result, resulting in the decrease of frame rate and stuttering.
Unfortunately, with OpenCV + Stasm, the recognition time of each frame is more than 1/30 of a second. It may be more suitable for still image recognition.
Therefore, Face++ is also recommended.
2. How to integrate
OpenCV is introduced via CocoPods:
pod 'OpenCV2-contrib'
Copy the code
Opencv2-contrib contains more extensions than OpenCV2, such as the Face module, which the Stasm library relies on.
The Stasm library can be downloaded from this address. You need to add both the Stasm and Haarcascades folders to the project.
3. How to use it
The recognition of key points is realized by calling stASM_search_single function.
Due to the long detection time of this method, we will do single-channel, size compression and other processing before passing in the video frame data. This reduces the amount of data Stasm takes per frame, effectively shortening the detection time, but also resulting in a loss of detection accuracy.
Key code:
- (float *)detectInOpenCVWithSampleBuffer:(CMSampleBufferRef)sampleBuffer
facePointCount:(int *)facePointCount
isMirror:(BOOL)isMirror {
cv::Mat cvImage = [self grayMatWithSampleBuffer:sampleBuffer];
int resultWidth = 250;
int resultHeight = resultWidth * 1.0 / cvImage.rows * cvImage.cols;
cvImage = [self resizeMat:cvImage toWidth:resultHeight]; // It is not rotated yet, so the height is passed in
cvImage = [self correctMat:cvImage isMirror:isMirror];
const char *imgData = (const char *)cvImage.data;
// Whether to find a human face
int foundface;
// stasm_NLANDMARKS means face key points, multiplied by 2 to store x and y, respectively
int len = 2 * stasm_NLANDMARKS;
float *landmarks = (float *)malloc(len * sizeof(float));
// Get the width and height
int imgCols = cvImage.cols;
int imgRows = cvImage.rows;
[NSBundle mainBundle]. BundlePath = [NSBundle mainBundle]
const char *xmlPath = [[NSBundle mainBundle].bundlePath UTF8String];
// Return 0 to indicate an error
int stasmActionError = stasm_search_single(&foundface,
landmarks,
imgData,
imgCols,
imgRows,
"",
xmlPath);
// Prints an error message
if(! stasmActionError) { printf("Error in stasm_search_single: %s\n", stasm_lasterr());
}
/ / release CV: : Mat
cvImage.release();
// Recognize the face
if (foundface) {
// Convert coordinates
for (int index = 0; index < len; ++index) {
if (index % 2= =0) {
float scale = (self.videoSize.height / self.videoSize.width) / (16.0 / 9.0);
scale = MAX(1, scale); // Scale horizontally when the ratio exceeds 16:9
landmarks[index] = (landmarks[index] / imgCols * 2 - 1) * scale;
} else {
float scale = (16.0 / 9.0)/(self.videoSize.height / self.videoSize.width);
scale = MAX(1, scale); // Scale vertically if the ratio is less than 16:9
landmarks[index] = (landmarks[index] / imgRows * 2 - 1) * scale;
}
}
*facePointCount = stasm_NLANDMARKS;
return landmarks;
} else {
free(landmarks);
return nil; }}Copy the code
Second, key point drawing
Through the above steps, we have vertex data, the only difference is the number of vertices in the two ways.
Vertex data is drawn in GPUImageFilter. We want to define a filter, and then in this filter to achieve the key point of the face of the drawing logic.
In GPUImageFilter, rendering process in – renderToTextureWithVertices: textureCoordinates: performed in this method. So for custom filters, we need to override this method.
In this method, we need to do two things, one is to draw the input texture intact, the other is to draw the key points of the face.
The texture is drawn using triangle primitives, and the key point of the face is drawn using point primitives, so we need to be divided into two drawings. In the original rendering method, there is already the rendering logic of texture. Therefore, we only need to add the key points of the face after the end of the texture drawing.
The complete rewritten method:
- (void)renderToTextureWithVertices:(const GLfloat *)vertices
textureCoordinates:(const GLfloat *)textureCoordinates {
if (self.preventRendering)
{
[firstInputFramebuffer unlock];
return;
}
[GPUImageContext setActiveShaderProgram:filterProgram];
outputFramebuffer = [[GPUImageContext sharedFramebufferCache] fetchFramebufferForSize:[self sizeOfFBO] textureOptions:self.outputTextureOptions onlyTexture:NO];
[outputFramebuffer activateFramebuffer];
if (usingNextFrameForImageCapture)
{
[outputFramebuffer lock];
}
[self setUniformsForProgramAtIndex:0];
glClearColor(backgroundColorRed, backgroundColorGreen, backgroundColorBlue, backgroundColorAlpha);
glClear(GL_COLOR_BUFFER_BIT);
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, [firstInputFramebuffer texture]);
glUniform1i(filterInputTextureUniform, 2);
glUniform1i(self.isPointUniform, 0); // Means to draw a texture
glVertexAttribPointer(filterPositionAttribute, 2, GL_FLOAT, 0.0, vertices);
glVertexAttribPointer(filterTextureCoordinateAttribute, 2, GL_FLOAT, 0.0, textureCoordinates);
glDrawArrays(GL_TRIANGLE_STRIP, 0.4);
/ / draw point
if (self.facesPoints) {
glUniform1i(self.isPointUniform, 1); // represents a drawing point
glUniform1f(self.pointSizeUniform, self.sizeOfFBO.width * 0.006); // Set the size of the point
glVertexAttribPointer(filterPositionAttribute, 2, GL_FLOAT, 0.0.self.facesPoints);
glDrawArrays(GL_POINTS, 0.self.facesPointCount);
}
[firstInputFramebuffer unlock];
if(usingNextFrameForImageCapture) { dispatch_semaphore_signal(imageCaptureSemaphore); }}Copy the code
When drawing point primitors, you can specify the size of a point by assigning gl_PointSize. It is then controlled externally through uniform variable pass-values.
Vertex shader code:
precision highp float;
attribute vec4 position;
attribute vec4 inputTextureCoordinate;
varying vec2 textureCoordinate;
uniform float pointSize;
void main()
{
gl_Position = position;
gl_PointSize = pointSize;
textureCoordinate = inputTextureCoordinate.xy;
}
Copy the code
Since the logic for the two renders is separate, in general, different shaders should be used to implement them. But since the rendering logic here is relatively simple, we simply put the logic for both renders into the same Shader. This also avoids Program switching back and forth, and then using a UNIFORM variable to determine the current draw type.
Fragment shader code:
precision highp float;
varying vec2 textureCoordinate;
uniform sampler2D inputImageTexture;
uniform int isPoint;
void main()
{
if(isPoint ! =0) {
gl_FragColor = vec4(1.0.1.0.1.0.1.0);
} else {
gl_FragColor = texture2D(inputImageTexture, textureCoordinate); }}Copy the code
Finally, just add the filter to the filter chain, you can see the drawing effect of the key points of the face.
The source code
Check out the full code on GitHub.
reference
- IOS Computer Vision – Face recognition
- IOS – Key points of face feature acquisition
Detect key points of face in GPUImage