So, without the iPhone X, I can play hacking on my $2,000 Android phone.


Interviewer: Pigeons


Emojis have been flooding social media apps in China for a long time, but now we can play with something new.

That’s it! Led by apple’s iPhoneX, a new form of entertainment is coming…

It is a vivid and domineering illustration of what a real watch is! Feeling! Bag!

My face is my choice. Ho-ho…

That’s Animoji on the iPhone X with a new Face ID-based feature.

When you look at the camera and raise your eyebrows, frown, move your chin, open your eyes, close your eyes, open your lips and grin, the phone will capture your expression and give it a cute cartoon image, generating a real time emoji pack of your own.

If today we focus on talking about Apple, that is too low, so long after the press conference, too sorry for the effectiveness of the media.

Today, we want to say,
A mysterious Chinese company launched the same feature on mobile phones two years ago and has taken it further. What’s even more amazing is that it did the same thing without a depth camera at all, or even a binocular camera.This caught the attention of AI tech camp.

The low-key company is called appMagics.

As early as 2016, the company completed A round of financing of 10 million RMB, and the investors were Geek Bang and Zihui Venture Capital. After obtaining the strategic investment from Blue Harbor Interactive in September 2016, it completed A round of A+ financing of tens of millions of RMB led by Huaguai Capital and co-invested by Bojongzihui in June 2017.

After seeing the display of emojis at Apple’s press conference, reporters from AI Technology Base contacted appMagics for the first time and tried the products.

The overall feeling is: the experience is very smooth, the expression of the fit degree is high, the simulation is very real and delicate. Occasionally, however, facial recognition catches the line off when the person suddenly pulls away and shakes fast and violently.

After this
AI Technology Base reporter had an exclusive interview with appMagics CTO Jin Yulin.The question is slightly sharp, reply to refuse routine.We want to know what appMagics, which is more than two years old, has to do with Apple’s latest dark technology. The content of the interview should be kept unchanged as far as possible without disclosing the core technology secrets of the adopted company, and some words should be slightly modified without changing the original intention.

AI Technology Base:
A brief introduction to your technical background.

Jin Yulin:I got my master’s degree in computer graphics at Beihang University, and then WENT on to Stanford to study computational geometry, which is a branch of computer graphics. After graduation, I stayed in Microsoft headquarters in the United States. I was one of the founders of 3D printing project of Microsoft Research Institute and applied for many patents. So I’ve been doing computer graphics for about 15 years.

AI Technology Base:

Let’s get right to the point. What’s the technical principle behind facial recognition?

Jin Yulin:

I’ll try to make it as simple as possible.

The principle is roughly divided into three steps. The first step is face key point recognition and tracking, the second step is expression analysis and mapping, and the third step is 3d model control.

Specifically, face key point recognition and real-time tracking, is according to the key points marked on the face, for example, where is the eyebrow, where is the eye, where is the mouth, so that the camera can clearly understand the face.

Facial expression analysis, on the other hand, predicts whether a user will close their eyes or speak, happy or sad, based on the movements of key facial information such as eyebrows, eyes and mouth that are identified and tracked.

The control of 3d model is to use facial key information to drive the constructed virtual image to learn and simulate the user’s expression.

Generally speaking, the first two parts belong to the research scope of computer vision, and the third part belongs to the research field of computer graphics.

AI tech base: what’s the difference between appMagics and apple’s new iPhoneX?

Jin Yulin:I just mentioned that the principle of facial expression animation is divided into three steps. The difference between us and Apple is mainly in the first two steps, that is, the first step is face key point recognition and tracking, and the second step is facial expression analysis and mapping.

To put it simply, Apple is based on 3D system (RGBD camera), while we are based on 2D system (RGB camera), depending on different hardware and software systems. (Xiaobian note: For easy understanding, the following RGB cameras are collectively referred to as 2D system, and RGBD cameras are collectively referred to as 3D system)

What’s the difference between a 2D and 3D system?

For eyes, eyebrows, nose, mouth and other obvious facial boundaries and features, there is almost no difference between 2D and 3D systems in capturing as long as enough facial data is trained.

But for less obvious facial features such as foreheads and cheeks, the 2D system is not as accurate as the 3D system. It is not easy to recognize these points in 2D, but 3D can recognize points such as forehead and cheek due to the addition of depth information (z-axis).

Take apple’s depth camera system, which is a 3D structured light sensing system. In other words, the system has not only the planar visual information captured by the daily 2D system (common front-facing RGB camera), but also the depth visual information, namely the Z-axis.

The z-axis depth data is derived mainly from the Dot Projector, which projects infra-red light onto the face of the person. The infrared camera receives the distortion projected onto the face and calculates the depth of each point of the face.

In this way, each point not only has the coordinates of the plane, but also has the z-axis depth coordinates, and the positioning of each image information point is more accurate.

As for step three, we’re no different than Apple. In terms of 3d model control and final rendering effect, our positioning from the very beginning is 3D mixed reality.

AI Technology Base:
Is it fair to say that apple’s iPhoneX is superior in both accuracy and performance?

Jin Yulin:In terms of objective conditions, iPhoneX can accurately capture more details because it can recognize more key points based on the hardware of RGBD camera.

In addition, since apple has complete control over its own hardware and software systems, the performance of the iPhoneX will be more stable. In many ways, iphones are far better than other phones, thanks to integrated software and hardware, let alone facial expressions. This is determined by the objective hardware configuration.

But when it comes to Apple’s strengths, that’s not entirely true. Two points need to be made here.

First, for the capture of expression, the more accurate is not necessarily better. For human senses are not entirely realistic.

How to understand? Let me give you an example. When you close your eyes, they’re not really closed at the same time, but you think they’re closed at the same time, so when you see the expression that captures one eye closing a little bit slower, it’s uncomfortable.

For example, when you close one eye, the other eye actually Narrows a little, but you don’t know it. So when you see expression capture at its most realistic, it’s uncomfortable because it doesn’t match your subconscious.

So, when we use expression capture in pan-entertainment Settings, it’s more important to have avatars that can convey human emotions. So, we’ll do some algorithmic processing to give you a visual balance between the real and the virtual. In this case, the more realistic the better.

People who do VR and AR probably understand this better.

Second, even if the hardware doesn’t have some degree of realism that must be achieved, we can make it up with algorithms, and the results are no worse.

One of the things that we’ve been working on for the last two years is to optimize our algorithms to make facial expression simulations work on normal phones and convey human emotions.

What you mean?

To put it simply, we rely on algorithms to make up for the parts that are not easy to be captured by ordinary RGB cameras.

For example, when I’m grinning, we use algorithms to predict and simulate the swelling of my facial muscles; And when you’re frowning, you’re also using algorithms to mimic forehead changes. We simulate 3D data on the 2D camera system, and we can play the facial expression animation without the objective hardware foundation, and there is not much difference.

In other words, through the algorithm, as far as possible to reduce the hardware requirements and hardware costs of playing facial animation. So far we can play it on the iphone6, we can play it on android.

AI Technology Base:
Is this a core competency in technology?

Jin Yulin:You could say that.

In fact, when it comes to facial recognition, Hollywood has been using it for special effects for years.

For example, the expressions in Avatar and World of Warcraft are realized by 3D reconstruction. The idea is that by applying a texture to the actor’s face and creating a dot matrix (the same as apple dot matrix projectors), the features of the face appear. This way, when your facial expression changes, the dots on your face move with it, and the camera picks them up so they can be used on a 3D model.

But Hollywood hardware is so expensive, what do ordinary people do if they want to play this? Therefore, we redesigned the algorithm based on the RGB camera of ordinary mobile phones, and used the algorithm to compensate for the lack of depth information to achieve these functions.

The consumptionization of film and television CG technology and the move of film and television animation and play into everyone’s pocket can be used for iPhonen, Android phones, ordinary PC and Mac, which can be used across platforms. This is the core competitiveness.

To put it bluntly, it is to take industrial-grade capabilities and technologies and turn them into civilian ones, so that people can play them if they want, without thinking too much about hardware configuration.

AI Science and Technology Base: If the core competitiveness accumulated over the years lies in using algorithms in 2D systems (RGB cameras) to do things that 3D systems (RGBD cameras) can do. So when 3D cameras become ubiquitous, is there still an advantage?

Jin Yulin:As I just mentioned, in facial expression recognition, we are at the core of the algorithm is divided into three parts, the reality of the key information is a 2 d camera system based on universal, but from the start, we adopt the 3 d data of simulation and control, if one day all mobile phones can directly obtain 3 d data, then this step of our algorithm is not doing any change, Direct reuse, so this part of the accumulated advantage is still there.

However, as you said, if 3D cameras become popular, the threshold of software algorithms in the whole industry will be greatly lowered, and the algorithm accumulation and optimization we do based on 2D systems will indeed have no particular advantages.

But keep in mind that RGBD cameras aren’t going to be easy to get everywhere. Apple just introduced them on the iPhone X, and they don’t even have them on the iPhone 8, because the current barriers to RGBD in terms of hardware miniaturization costs and power consumption are too high.

To put it this way, from now to the future for a long time, the vast majority of mobile phones in the market are still Apple and Android phones with 2D cameras, so the 3D algorithm barriers accumulated based on 2D systems still exist for a long time.

AI Technology Base:
So, based on the current advantages, mainly grab the low-end 2D camera market?

Jin Yulin:Technically, two directions.

One direction focuses on depth and precision, high-end mobile phone market, and the ability to continuously develop new algorithms based on existing technology accumulation, including algorithms that can be directly compatible with 3D systems;

One direction focuses on the breadth, and continues to expand the applicability of the technology in low-end mobile phones for 2D systems. At present, the positioning of our algorithm can be applied to apple models above iphone5, as well as mainstream android models. It will work its way down to more mid – and low-end Android devices to reach more users.

Both are important.

Technology aside, consider the company’s overall strategy, says founder Leody. Fu Yingna, the founder and CEO of appMagics, has always emphasized that our positioning is to cross the boundary and cross the boundary. We should not immerse ourselves in computer vision, graphics and artificial intelligence every day. Behind the technology should be feeling, emotion and perceptual elements.

Cartoon emoticons designed by appMagics

AI technology base: In this case, why not develop the whole set of technology behind their own, why not directly call the technology of the third party face recognition technology company, focus on creating entertainment products, won’t it be easier?

Jin Yulin:In fact, at the beginning, I considered using third-party technology, but I tried all the third-party technology and found no way to directly use it.

Why is that?

You see, the biggest market for CV right now is security and finance.

For security and finance, computer vision is mainly used to determine whether someone is the real person in a very short time. Our requirement is whether the facial expression recognized by the computer is fine, and whether the effect simulated by the virtual facial expression is accurate.

These are two very different goals. Then the data and algorithms trained by it can only serve one goal, there is no way to be compatible.

Another, most of the current face recognition algorithms are two-dimensional, but two-dimensional algorithms have no depth information, which is far from enough to be used in expression simulation and control, because many key points with no obvious features cannot be captured. This has to be completed with a three-dimensional algorithm.

So, we had to do it ourselves, design the algorithm from start to finish, do the data training.

AI Technology Base:
The CV field seems to be getting more and more specific in the future, with more and more different goals to be achieved.

Jin Yulin:You have to be specific.

AI Technology Base:
Estimate, how big is the market size of facial expression animation?

Jin Yulin:If nothing else, just the phone. If all current mobile users, whether they are Apple or Android, high school or low school, want to play emojis, and the hardware conditions of existing phones can carry them, how big a market do you think it is? You can talk to Leody about that specifically.

AI Technology Base:
Apple’s iPhone X event is a great PR opportunity for the company. Has there been a particularly big visual impact recently?

Jin Yulin:Especially big! These days, because apple iPhone X push emoticons this thing, suddenly a lot of partners and investors, what android manufacturers, APP, do input method companies, all crowded over.

Leody hasn’t got back to Beijing yet.

What an industry is the best state, is a thing you do, only you are doing at the beginning, slowly, a lot of people found that “hey, it’s very useful”, so you do have to learn, because the facial expression animation is likely in the next two years to become the App and mobile phone standard, it is proved that the predicted by the way, before you you on bets. (smile)



The attached:

Founder and CEO of appMagics

Leody Fu is a female geek + serial entrepreneur. He left SONY Ericsson in 2004 to form MoGenisis, which was acquired by Symbian (Nokia) in 2007. He joined Microsoft in 2010 and successively served as a senior executive in the Greater China region and the US headquarters, leading the team to carry out the communication and promotion of Microsoft’s new technologies. AppMagics was founded in 2014, focusing on cross-boundary innovation in computer vision mixed reality related technology and entertainment field.