Obtain using ARKit

Faceid devices can use ARKit directly to get an accurate eye-to-screen distance. Mainly rely on ARKit and SceneKit

Use the XYZ property of the provided SCNNode to calculate the distance to the eyeball’s Node. worldPosition – Origin (SCNVector3Zero). After averaging the distance between left eye and right eye, a more accurate distance number can be obtained, and the calculation of head turning is also more accurate.

let rightEyeDistanceFromCamera = self.rightEye.worldPosition - SCNVector3Zero
let rightEyeDistanceFromCamera = self.rightEye.worldPosition - SCNVector3Zero

// Calculate the left and right average distance
let averageDistance = (leftEyeDistanceFromCamera.length() + rightEyeDistanceFromCamera.length()) / 2
Copy the code

Length () is xyz squared the square root of node and so on

extension SCNVector3{
    //The Length Of Vector
    func length() -> Float { return sqrtf(x * x + y * y + z * z) }
    //Subtract Two SCNVector3's
    static func - (l: SCNVector3, r: SCNVector3) -> SCNVector3 { return SCNVector3Make(l.x - r.x, l.y - r.y, l.z - r.z) }
}
Copy the code

Disadvantages are also obvious, a lot of power consumption, and the need to create an ARSCNView, available devices are also limited to faceID.

Create ARSCNView

        // The device is not supported
        if !checkARSupport() {
            return
        }

        if !checkCameraPermission() {
            print("No camera permission")
            return
        }

        let config = ARFaceTrackingConfiguration()
        config.isLightEstimationEnabled = true

        self.sceneView.delegate = self
        self.sceneView.showsStatistics = true
        self.sceneView.session.run(config, options: [.resetTracking, .removeExistingAnchors])

Copy the code

Set eye SCNNode

func setupEyeNode(a) {
        let eyeGeometry = SCNSphere(radius: 0.005)
        eyeGeometry.materials.first?.diffuse.contents = UIColor.green
        eyeGeometry.materials.first?.transparency = 1.0
        let node = SCNNode()
        node.geometry = eyeGeometry
        node.eulerAngles.x = -.pi / 2
        node.position.z = 0.1

        leftEye = node.clone()
        rightEye = node.clone()
    }
Copy the code

Add node to ARSCNView’s Delegate and update face data. Add a node

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
        / / the node processing
        //setup node eye and face
        self.faceNode = node
        guard let device = self.sceneView.device else { return }
        let faceGeo = ARSCNFaceGeometry(device: device)
        self.faceNode.geometry = faceGeo
// self.faceNode.geometry? .firstMaterial? .fillMode = .lines

        self.faceNode.addChildNode(self.leftEye)
        self.faceNode.addChildNode(self.rightEye)
        self.faceNode.transform = node.transform
    }
Copy the code

Update the data

func renderer(_ renderer: SCNSceneRenderer, didUpdate node: SCNNode, for anchor: ARAnchor){ self.faceNode.transform = node.transform self.faceNode.geometry? .materials.first? .diffuse.contents = UIColor.yellow//update node
            guard let faceAnchor = anchor as? ARFaceAnchor else {
                // No human face found
                print("NO FACE")
                return
            }

            // Face data
            if let faceGeo = node.geometry as? ARSCNFaceGeometry {
                faceGeo.update(from: faceAnchor.geometry)
            }
            leftEye.simdTransform = faceAnchor.leftEyeTransform
            rightEye.simdTransform = faceAnchor.rightEyeTransform
            // Get the distance
            trackDistance()
    }
Copy the code

Finally calculate the distance between the eyes

func trackDistance() {
        DispatchQueue.main.async {
            let leftEyeDistanceFromCamera = self.leftEye.worldPosition - SCNVector3Zero
            let rightEyeDistanceFromCamera = self.rightEye.worldPosition - SCNVector3Zero

            // Calculate the left and right average distance
            let averageDistance = (leftEyeDistanceFromCamera.length() + rightEyeDistanceFromCamera.length()) / 2
            let averageDistanceCM = averageDistance * 100}}Copy the code

The eye distance can be obtained by subtracting the camera node from the left eye and the right eye respectively to get the average distance.

The above is the method of obtaining the distance of human eyes by using ARKit.

The use of Vision

In common devices that do not support ARKit or require the use of a camera, Vision can be used to calculate. Camera parameters of different devices will be different, such as CCD size, focal length and so on. Here’s a concept to understand: equivalent focal length

For different focal lengths we can convert 35mm equivalent focal length to calculate (35 mm equivalent focal length). Currently, iOS does not have a good API to obtain the equivalent focal length directly. Currently, we can obtain the exif information of photos by taking photos, in which FocalLenIn35mmFilm can obtain 35mm equivalent focal length.

Calculation principle

This is based on the distance between the eyes, and this is an average, which varies from person to person. Adults are about 63mm, children can change with age. There is no good average for children’s distance between the eyes, according to an eyewear website.

if age < 4 && age > 0 { / / 0 to 4
    return 45
}else if age >= 4 && age <= 7 { / / 4 to 7
    return 50
}else if age >= 8 && age <= 11 {/ / 8-11
    return 56
}else if age >= 12 && age <= 16 {//12 - 16
    return 59
}else if age > 16 { / / > 17
    return 63
}

Copy the code

There are two optical formulas that are used in this calculation, kind of like keyhole imaging

Optical formula1/ object distance +1/ like =1/ Focal length Image height/image distance = object height/object distanceCopy the code

Calculation example

At the equivalent focal length, the imaging area can be considered 36mm * 24mm. Assume a screen pixel of 1920 by 1080.

Distance = (1 + 63 * 1080 / 24/ Binocular pixel distance) * Equivalent focal lengthCopy the code

The actual calculation will also take a FOV(Feild of View) scale, FOV and pixels can use AVCaptureDevice. Format will have a lot of data in the current format.

format resolution = <AVCaptureDeviceFormat: 0x2817f4130 'vide'/'420v' 1920x1080, { 1 - 30 fps}, HRSI:3392x1908.Fov: 61.161.supports vis.max Zoom: 16.00 (upscales @1.61), ISO: 18.0 to 1728.0.SS: 0.000020 to 1.000000.supports HDR.supports multicam>
Copy the code

The actual calculation

distance = (1.0+ self.realEyeDistance * Float(self.previewLayer! .frame.width) /24 / (self.eyeDistance)) * self.fLength / 10.0 * self.fovFactor
Copy the code

previewLayer! .frame.width Width of the camera preview

EyeDistance The pixel distance between your eyes

FLength equivalent focal length

RealEyeDistance Is the real pupil distance (above 63mm), which is the basis of all our calculations.

FovFactor FOV ratio The FOV ratio obtained from the format is calculated as follows

func processFOV(device: AVCaptureDevice) {
        let currentFOV = device.activeFormat.videoFieldOfView

        if let basicFov = device.formats.last?.videoFieldOfView {
            self.fovFactor = currentFOV / basicFov
        }
    }
Copy the code

About the fLength equivalent focal length

After taking photos of multiple Apple devices and various iphones and ipads, we found that most devices were between 30 and 32, and some of them did not use a mapping table on 29 side, which should be more accurate in theory. The average fLength here is 31

Migraine and computing

One obvious problem is that most of the time you’re not looking directly at the camera, so there’s an Angle between your eyes and the camera. Theoretically, we should calculate this, but there is no calculation at present, deflection will only be the detection distance is too large. Apple provides a set of deflection angles YAW, which unfortunately cannot be used directly because the range is too large.

face.yaw! . FloatValueyaw ranges from -90 to 90. But the sensitivity is too low, you have very large numbers like -90, -45, 0, 45, 90. So if you use this calculation, it’s not accurate.

Face data

Here, the Vision framework is used to extract face data.

       let handler = VNImageRequestHandler(cgImage: image, orientation: .downMirrored, options: [:])
        let faceRequest = VNDetectFaceLandmarksRequest.init{[weak self] (vnRequest, _) in
            if let result = vnRequest.results as? [VNFaceObservation] {
                self?.processLandmarks(faces: result)
            } else{}}// Reduce CPU/GPU usage
        faceRequest.preferBackgroundProcessing = true
        try? handler.perform([faceRequest])
Copy the code

Through the frame data (or cgImage CVPixelBuffer) create VNDetectFaceLandmarksRequest, after testing can be a contain VNFaceObservation array

Once I get the data I can see if it contains a face, and then I can do the calculation

        guard let preview = self.previewLayer else {return}

        // Default first face
        let firstFace = faces[0]
        
        // Canvas relative scale
        var faceBoxOnscreen = preview.layerRectConverted(fromMetadataOutputRect: firstFace.boundingBox)

        if !useCamera {
            faceBoxOnscreen = CGRect(x: preview.frame.width * firstFace.boundingBox.origin.y, y: preview.frame.height * firstFace.boundingBox.origin.x, width: preview.frame.width * firstFace.boundingBox.size.height, height: preview.frame.height * firstFace.boundingBox.size.width)
        }

        let x = faceBoxOnscreen.origin.x
        let y = faceBoxOnscreen.origin.y
        let w = faceBoxOnscreen.size.width
        let h = faceBoxOnscreen.size.height

        / / the left eye ball
        if let leftPupil = firstFace.landmarks?.leftPupil {
            / / in the right eye ball
            if let rightPupil = firstFace.landmarks?.rightPupil {
                guard let leftEyePoint = leftPupil.normalizedPoints.first else { return}
                guard let rightEyePoint = rightPupil.normalizedPoints.first else { return }

                let leftX = leftEyePoint.y * h + x
                let rightX = rightEyePoint.y * h + x
                let leftY = leftEyePoint.x * w + y
                let rightY = rightEyePoint.x * w + y
                self.eyeDistance = sqrtf(powf(Float(leftX - rightX), 2) + powf(Float(leftY - rightY), 2))}}Copy the code

UseCamera is a parameter to explain that we can have two situations, one is that we start the camera by ourselves, so the display area and picture are controlled by themselves and can be directly acquired. The second case is we create the canvas, but the camera is external control, just passing us the frame data. So we need to do some transformation. The rest of the calculations should make a lot of sense.

Finally, we can judge the horizontal and vertical screen to get the distance between the eyes

        if UIDevice.current.orientation.isLandscape {
            distance = (1.0+ self.realEyeDistance * Float(self.previewLayer! .frame.width) /24 / (self.eyeDistance)) * self.fLength / 10.0 * self.fovFactor
        } else {
            distance = (1.0+ self.realEyeDistance * Float(self.previewLayer! .frame.height) /36 / (self.eyeDistance)) * self.fLength / 10.0 * self.fovFactor
        }
Copy the code

Here, realEyeDistance refers to pupil distance. For adults, the current value of 63mm is also the average. So that gives you a rough idea of the distance between your eyes. Older devices also have a performance problem due to the constant detection of eye distance. Then the detection frequency can be controlled, for example, 20 frames of detection once, or screen too old devices, such as iPhone5s or iPad Air generation below.

The calculation of head deflection Angle has not been done yet, so I have no idea of a good method for now. Let’s see if we can optimize it in the future.

Thank you for reading ~ 🙂

A demo will be released later, if you are interested in it.