This article describes an example of using Dlib to detect facial feature points on iOS. Contains compiled Dlib library, video stream face key point detection, photo face key point detection. The effect is shown below

1. Dlib profile

Dlib is a modern C ++ toolkit containing machine learning algorithms and tools for creating complex software in C ++ to solve real-world problems. For main project documentation and API references, see dlib.net or GitHub github.com/davisking/d…

2. Compile Dlib on Xcode

In this step we need to code the Dlib into a static library in Xcode. First, we download the source code for Dlib. Use a Dlib library compiled by someone else and skip to the next step. To accomplish this step, there are the following requirements:

  • X11, if not installed click download
  • Xcode
  • Cmake, if not installed, can be installed via Homebrew
2.1 Download Source Code

GitHub repository in Dlib github.com/davisking/d… Download the source code

2.2 Create an Xcode compilation project for Dlib. Go to the root directory where the Dlib source code is downloaded and run the following command
cd examples/

mkdir build

cd build

cmake -G Xcode ..

cmake --build . --config Release
Copy the code

The build directory will have examples. Xcodeproj and the dlib_build folder, as shown in the figure

Go to the dlib directory, open dlib. xcodeProj, and change the dlib project Settings to the ones shown below

Select the dlib target and view the Settings to make sure that the Settings for the dlib project are the same as those for dlib.xcodeProj, as shown below

Select the dlib target and compile the x86 and ARM static libraries respectively, as shown below

Build x86 static libraries

Compile the ARM static library

Select the compiled dlib static library under Products in the project navigation bar and right-click to find the folder

Back in the previous directory, you can see the emulator static library folder and the real machine static library folder

The dlib static library we need is compiled

3. Create an iOS App for face detection

3.1 Create an Xcode project named DlibDemo

Create a new folder under the project root directory and name it Dlib. Copy the compiled Dlib static libraries to this directory. The compiled ARM static libraries are stored in the lib-iphoneOS directory and the compiled x86 static libraries are stored in the lib-iphonesimulator directory. Copy the dlib folder from the dlib-master directory to this directory as well; Download the model shape_predictor_68_face_landmarks. Dat and copy the model to this directory as well. As shown in the figure below

Right click Add Files to “DlibDemo”… Add this folder to the project.

Then right-click the dlib folder, select Delete, and then Select Remove References. Remove from project, not delete. The dlib directory does not need to be added to the project because the header search path will be added to the dlib directory. If it is added, an error will be reported. Remove lib-iphonesimulator and lib-iphoneOS from the project in the same way. The project will set the path to search for the static library Lib. A. In the Dlib folder of the project, only shape_predictor_68_face_landmarks. Dat is left

3.2 Setting compilation Options

Set HEADER_SEARCH_PATHS to $(PROJECT_DIR)/DlibDemo/Dlib/ to find the Dlib header file

Set LIBRARY_SEARCH_PATHS to $(SRCROOT)/DlibDemo/Dlib/Lib$(EFFECTIVE_PLATFORM_NAME), $(EFFECTIVE_PLATFORM_NAME) is the Xcode macro. When the emulator is compiled, its value is -iphonesimulator. The Lib$(EFFECTIVE_PLATFORM_NAME) value is lib-iphonesimulator, which corresponds to the x86 static library folder of the Dlib. If the value is -iphoneos, Lib$(EFFECTIVE_PLATFORM_NAME) is lib-iphoneOS, which corresponds to the ARM static library folder of Dlib.

Set OTHER_LDFLAGS, add -l”dlib”

Set OTHER_CFLAGS = -dndebug -DDLIB_JPEG_SUPPORT -DDLIB_USE_BLAS -DDLIB_USE_LAPACK -DLAPACK_FORCE_UNDERSCORE

If the project is new, create DlibDemo/ dlibdemo-bridging-header. h and set SWIFT_OBJC_BRIDGING_HEADER = DlibDemo/ dlibdemo-bridging-header. h

Set the Debug environment to Fastest, Smallest[-OS]. I’ve been stuck here for a long time, and if it’s not set, the detection process is very, very slow and takes a very long time. The slowest [-OS] mode is used by default, so we don’t need to set it. After debugging, you can change the Debug mode compilation optimization back to None[-o0] to avoid affecting debugging.

Add dependent FrameWork, Accelerate. FrameWork

4. Use Dlib in your project

4.1 Wrapper Dlib

Since Dlib is written in C++, we need to create dlibwrapper. h and dlibwrapper. mm files to Wrapper Dlib. Dlibwrapper. h is used to expose the method name. Do not import any dlib-related header files, otherwise any files that introduce dlibwrapper. h will be implemented as.mm. Dlibwrapper. mm introduces dlib-related headers and implements related methods.

The code for dlibwrapper.h is as follows

#import <Foundation/Foundation.h>
#import <CoreMedia/CoreMedia.h>

@interface DlibWrapper : NSObject

- (instancetype)init;
- (void)prepare;
- (void)doWorkOnSampleBuffer:(CMSampleBufferRef)sampleBuffer inRects:(NSArray<NSValue *> *)rects;
- (void)doWorkOnImagePath:(NSString*)imagePath savePath:(NSString*)savePath;
@end
Copy the code

The code for dlibwrapper.mm is as follows

#import "DlibWrapper.h"
#import <UIKit/UIKit.h>

#include <dlib/image_processing/frontal_face_detector.h>
#include <dlib/image_processing.h>
#include <dlib/image_io.h>
#include <dlib/image_processing/frontal_face_detector.h>
#include <dlib/image_processing/render_face_detections.h>

@interface DlibWrapper ()

@property (assign) BOOL prepared;

+ (std::vector<dlib::rectangle>)convertCGRectValueArray:(NSArray<NSValue *> *)rects;

@end
@implementation DlibWrapper {
    dlib::shape_predictor sp;
    dlib::frontal_face_detector detector;
}


- (instancetype)init {
    self = [super init];
    if (self) {
        _prepared = NO;
    }
    return self;
}

- (void)prepare {
    NSString *modelFileName = [[NSBundle mainBundle] pathForResource:@"shape_predictor_68_face_landmarks" ofType:@"dat"];
    std::string modelFileNameCString = [modelFileName UTF8String];
    
    dlib::deserialize(modelFileNameCString) >> sp;
    detector = dlib::get_frontal_face_detector();

    // FIXME: test this stuff for memory leaks (cpp object destruction)
    self.prepared = YES;
}

- (void)doWorkOnSampleBuffer:(CMSampleBufferRef)sampleBuffer inRects:(NSArray<NSValue *> *)rects {
    
    if (!self.prepared) {
        [self prepare];
    }
    
    dlib::array2d<dlib::bgr_pixel> img;
    
    // MARK: magic
    CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
    CVPixelBufferLockBaseAddress(imageBuffer, kCVPixelBufferLock_ReadOnly);

    size_t width = CVPixelBufferGetWidth(imageBuffer);
    size_t height = CVPixelBufferGetHeight(imageBuffer);
    char *baseBuffer = (char *)CVPixelBufferGetBaseAddress(imageBuffer);
    
    // set_size expects rows, cols format
    img.set_size(height, width);
    
    // copy samplebuffer image data into dlib image format
    img.reset();
    long position = 0;
    while (img.move_next()) {
        dlib::bgr_pixel& pixel = img.element();

        // assuming bgra format here
        long bufferLocation = position * 4; //(row * width + column) * 4;
        char b = baseBuffer[bufferLocation];
        char g = baseBuffer[bufferLocation + 1];
        char r = baseBuffer[bufferLocation + 2];
        // we do not need this
        // char a = baseBuffer[bufferLocation + 3];
        
        dlib::bgr_pixel newpixel(b, g, r);
        pixel = newpixel;
        
        position++;
    }
    
    // unlock buffer again until we need it again
    CVPixelBufferUnlockBaseAddress(imageBuffer, kCVPixelBufferLock_ReadOnly);
    
    // convert the face bounds list to dlib format
    std::vector<dlib::rectangle> convertedRectangles = [DlibWrapper convertCGRectValueArray:rects];
    
    // for every detected face
    for (unsigned long j = 0; j < convertedRectangles.size(); ++j)
    {
        dlib::rectangle oneFaceRect = convertedRectangles[j];
        
        // detect all landmarks
        dlib::full_object_detection shape = sp(img, oneFaceRect);
        
        // and draw them into the image (samplebuffer)
        for (unsigned long k = 0; k < shape.num_parts(); k++) {
            dlib::point p = shape.part(k);
            draw_solid_circle(img, p, 2, dlib::rgb_pixel(0.255.0)); }}// lets put everything back where it belongs
    CVPixelBufferLockBaseAddress(imageBuffer, 0);

    // copy dlib image data back into samplebuffer
    img.reset();
    position = 0;
    while (img.move_next()) {
        dlib::bgr_pixel& pixel = img.element();
        
        // assuming bgra format here
        long bufferLocation = position * 4; //(row * width + column) * 4;
        baseBuffer[bufferLocation] = pixel.blue;
        baseBuffer[bufferLocation + 1] = pixel.green;
        baseBuffer[bufferLocation + 2] = pixel.red;
        // we do not need this
        // char a = baseBuffer[bufferLocation + 3];
        
        position++;
    }
    CVPixelBufferUnlockBaseAddress(imageBuffer, 0);
}

- (void)doWorkOnImagePath:(NSString*)imagePath savePath:(NSString*)savePath {
    if (!self.prepared) {
        return;
    }
    
    std::string fileName = [imagePath UTF8String];
    //creat image
    dlib::array2d<dlib::rgb_pixel> img;
    
    //load ios image
    dlib::load_image(img,fileName);
    
    //dlib face recognition
    std::vector<dlib::rectangle> dets = detector(img);
    NSLog(@"Number of faces %lu",dets.size());// The number of faces detected
    
    for (unsigned long j = 0; j < dets.size(); ++j) {
        dlib::full_object_detection shape = sp(img, dets[j]);
        // and draw them into the image (samplebuffer)
        for (unsigned long k = 0; k < shape.num_parts(); k++) {
            dlib::point p = shape.part(k);
            // Point p diameter 2 parameter is the origin diameter rgb_pixel color
            dlib::draw_solid_circle(img, p, 2, dlib::rgb_pixel(0.255.0));
        }
    }
    dlib::save_jpeg(img, [savePath UTF8String]);
}

+ (std::vector<dlib::rectangle>)convertCGRectValueArray:(NSArray<NSValue *> *)rects {
    std::vector<dlib::rectangle> myConvertedRects;
    for (NSValue *rectValue in rects) {
        CGRect rect = [rectValue CGRectValue];
        long left = rect.origin.x;
        long top = rect.origin.y;
        long right = left + rect.size.width;
        long bottom = top + rect.size.height;
        dlib::rectangle dlibRect(left, top, right, bottom);

        myConvertedRects.push_back(dlibRect);
    }
    return myConvertedRects;
}
@end
Copy the code

Add #import “dlibwrapper. h” to dlibdemo-bridge-header. h

#import "DlibWrapper.h"
Copy the code

4.2 Write video stream key point detection code

First, create SessionHandler. Swift to get the video stream, and call DlibWrapper for face keypoint detection for each video frame, as shown below

import AVFoundation

class SessionHandler : NSObject.AVCaptureVideoDataOutputSampleBufferDelegate.AVCaptureMetadataOutputObjectsDelegate {
    var session = AVCaptureSession(a)let layer = AVSampleBufferDisplayLayer(a)let sampleQueue = DispatchQueue(label: "com.zweigraf.DisplayLiveSamples.sampleQueue", attributes: [])
    let faceQueue = DispatchQueue(label: "com.zweigraf.DisplayLiveSamples.faceQueue", attributes: [])
    let wrapper = DlibWrapper(a)var currentMetadata: [AnyObject]
    
    override init() {
        currentMetadata = []
        super.init()}func openSession(a) {
        var device = AVCaptureDevice.devices(for: AVMediaType.video)
            .map{$0}.filter{$0.position == .front}
            .first
        if device == nil {
            return
        }
        
        let input = try! AVCaptureDeviceInput(device: device!)
        
        let output = AVCaptureVideoDataOutput()
        output.setSampleBufferDelegate(self, queue: sampleQueue)
        
        let metaOutput = AVCaptureMetadataOutput()
        metaOutput.setMetadataObjectsDelegate(self, queue: faceQueue)
    
        session.beginConfiguration()
        
        if session.canAddInput(input) {
            session.addInput(input)
        }
        if session.canAddOutput(output) {
            session.addOutput(output)
        }
        if session.canAddOutput(metaOutput) {
            session.addOutput(metaOutput)
        }
        
        session.commitConfiguration()
        
        let settings: [AnyHashable: Any] = [kCVPixelBufferPixelFormatTypeKey as AnyHashable: Int(kCVPixelFormatType_32BGRA)]
        output.videoSettings = settings as! [String : Any]
    
        // availableMetadataObjectTypes change when output is added to session.
        // before it is added, availableMetadataObjectTypes is empty
        metaOutput.metadataObjectTypes = [AVMetadataObject.ObjectType.face] wrapper? .prepare() session.startRunning()for output in session.outputs {
            for av in output.connections {
                if av.isVideoMirroringSupported {
                    av.videoOrientation = .portrait
                    av.isVideoMirrored = true
                }
            }
        }
        layer.videoGravity = AVLayerVideoGravity.resizeAspectFill

    }
    
    // MARK: AVCaptureVideoDataOutputSampleBufferDelegate
    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

        if! currentMetadata.isEmpty {let boundsArray = currentMetadata
                .flatMap { $0 as? AVMetadataFaceObject}.map { (faceObject) -> NSValue in
                    let convertedObject = output.transformedMetadataObject(for: faceObject, connection: connection)
                    return NSValue(cgRect: convertedObject! .bounds) } wrapper? .doWork(on: sampleBuffer, inRects: boundsArray) } layer.enqueue(sampleBuffer) }func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        print("DidDropSampleBuffer")}// MARK: AVCaptureMetadataOutputObjectsDelegate
    
    func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection) {
        currentMetadata = metadataObjects as [AnyObject]}}Copy the code

New VideoScanViewController. Swift, use SessionHandler, the code is as follows

import UIKit

class VideoScanViewController: UIViewController {
    let sessionHandler = SessionHandler(a)lazy var preview: UIView = {
        let view = UIView(a)return view
    }()
    
    override func viewDidLoad(a) {
        super.viewDidLoad()
        self.navigationItem.title = "Video stream detection of face feature points"
        self.view.backgroundColor = .white
        self.view.addSubview(preview)
        preview.frame = CGRect(x: 0, y: 0, width: self.view.frame.width, height: self.view.frame.height)
    }
    
    override func didReceiveMemoryWarning(a) {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }
    
    override func viewDidAppear(_ animated: Bool) {
        super.viewDidAppear(animated)
        sessionHandler.openSession()
        let layer = sessionHandler.layer
        layer.frame = preview.bounds
        preview.layer.addSublayer(layer)
        view.layoutIfNeeded()
    }
}
Copy the code

4.3 Write image face key point detection code

New AlbumViewController. Swift, use DlibWrapper to face point detection of photos, the code is as follows

import UIKit

class AlbumViewController: UIViewController {

    lazy var picker = UIImagePickerController(a)lazy var imageView: UIImageView = {
        let imageView = UIImageView()
        imageView.contentMode = UIView.ContentMode.scaleAspectFit
        return imageView
    }()
    
    lazy var wrapper = DlibWrapper(a)var filePath = ""
    var filePathWrite = ""

    override func viewDidLoad(a) {
        super.viewDidLoad()
        self.view.backgroundColor = .white
        self.navigationItem.rightBarButtonItem = UIBarButtonItem.init(title: "Album", style: .plain, target: self, action: #selector(albumClick(_:)))
        self.view.addSubview(imageView)
        imageView.frame = self.view.bounds
        
        let cachePath = NSSearchPathForDirectoriesInDomains(.cachesDirectory, .userDomainMask, true).first!
        filePath = (cachePath as NSString).appendingPathComponent("DlibCacheFileRead.jpg")
        filePathWrite = (cachePath as NSString).appendingPathComponent("DlibCacheFileWrite.jpg") wrapper? .prepare() }@objc func albumClick(_ button: UIButton) {
        let sourceType = UIImagePickerController.SourceType.photoLibrary
        picker.delegate = self
        picker.sourceType = sourceType
        self.present(picker, animated: true, completion: nil)}}extension AlbumViewController: UIImagePickerControllerDelegate.UINavigationControllerDelegate {
    
    func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
        let image = info[UIImagePickerController.InfoKey.originalImage]
        picker.dismiss(animated: true, completion: nil)
        DispatchQueue.main.async { [weak self] in
            if let image = image as? UIImage.let filePath = self? .filePath,let filePathWrite = self? .filePathWrite {let imageData = image.jpegData(compressionQuality: 1.0)
                try? imageData? .write(to:URL(fileURLWithPath: filePath))
                self? .wrapper? .doWork(onImagePath: filePath, savePath: filePathWrite)let detectImage = UIImage.init(contentsOfFile: filePathWrite)
                self? .imageView.image = detectImage } } }func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
        
        picker.dismiss(animated: true, completion: nil)}}Copy the code

5. Running result

This code can be compiled and run on emulators and real machines. However, there is no camera in the simulator, so the key point detection of video stream face can only be used in the real machine.

6. Summary

This article introduces, compiles the Dlib library, and uses examples in the Xcode project, including video stream face keypoint detection, photo face keypoint detection. It involves static library compilation and use knowledge, Swift and C++ mixing knowledge, AVFoundation knowledge, involving more content and attention points. If it needs to be used in the actual App, it needs to consider model compression, video stream optimization, performance optimization, BitCode and other issues. If you have any questions, you can pay attention to my public number, leave a message and communicate with me, and make progress together. For source code, public reply iOS.

Welcome to scan the code to pay attention to the public number, discuss and exchange together