Empty chestnut is from aofei Temple
Qubit product | public QbitAI
True · Deep learning ↓↓↓
Jeff Dean.
That’s because Google has injected the spirit of machine learning into Pixel’s camera: a bit of Depth on the background blurring task.
Before learning (Stereo) and after learning (Learned) are compared on the right side of the GIF. On the face of it, there was more emptiness where it ought to be:
It’s called deep learning.
But there’s more to neural networks than meets the eye.
Make up for stereoscopic vision
Previous portrait modes simply used stereoscopic principles:
Take two slightly different pictures of the same person in the same scene.
Play the two photos in a loop and find that the person is not moving, but the background has shifted. The phenomenon is called Parallax.
Using parallax to predict the depth of an object is called phase focusing (PDAF).
However, PDAF has its own limitations. First, the translation is usually very small, so it is difficult to find the function relationship.
The other is that stereoscopic vision technology always has a Aperture Problem: when you encounter a straight line, you may not be able to judge the direction of translation or the amount of translation.
Take a closer look at the chestnut again (this time not a GIF) :
For example, depth predictions are often wrong when there are horizontal lines in the graph. As shown on the left, several parallel plates should have similar depths, but the degree of blur is quite different.
So, the Google AI team decided to rely on other predictions rather than PDAF alone.
Multiple prediction tools x high quality data collection
The new method developed by the team adds a number of other predictive tools:
Points away from the in-focus Plane, for example, are not as sharp as those closer up. This provides one
defocusing(Defocus) Depth judgment basis.
For example, common objects in our lives, we already have a rough idea of their dimensions. Using the size of these objects in the image to determine the depth, yes
The semanticOn the basis of.
A CNN is used to combine these auxiliary bases with the original PDAF.
Special data collection posture
Training this CNN requires feeding a large number of PDAF images, which are groups of images with slightly different angles.
You also need high-quality Depth Maps that correspond to the image.
In addition, to improve the phone’s portrait mode, the training data needs to be similar to the phone’s camera.
So, the team DIY a very spooky piece of equipment. Tie five Pixel 3’s together and have them shoot at the same time (within 2 milliseconds).
There’s a story about the location of the five phones:
The five viewing angles ensure the existence of parallax in multiple directions, avoiding
The aperture problem;
It’s almost guaranteed that a point in one photo will appear in at least one other photo. The lack of rare
With reference toThe point;
The distance between the cameras
Much larger than the PDAF baseline, so that the prediction will be more accurate;
Synchronized photography, to make sure that
Dynamic sceneYou can also calculate depth in.
(Plus, the kit is portable and samples taken outdoors can also be collected.)
Eliminate other distractions
But even with good data, it’s not easy to accurately predict the depth of an object in a map.
With a pair of PDAF images, many different depth maps can be produced:
(Different lenses, different focal lengths, all affect depth judgment.)
To take this into account, the relative depth of each object is directly predicted, and the image factor of the second lens is removed.
This, the team says, produces satisfactory results.
Speed is of the essence
(Although, there may not be many Domestic Pixel users……)
The team wrote in a blog post that the camera needs to be quick to predict when taking a picture, without keeping a human with a camera holding a phone waiting too long.
So, use TensorFlow Lite to put CNN on your phone, and use Pixel 3’s GPU to do quick calculations.
Go to Version 6.1 of Google Camera and it will work.
With Google Photos, users can modify the depth themselves, change the blur value, and change the focus.
You can also use a three-way depth extractor to extract a JPG depth map for your own appreciation.
– the –
Welcome to our column: Qubit – Zhihu column
Sincere recruitment
Qubit is looking for editors/reporters to work in Zhongguancun, Beijing. Looking forward to the talented and enthusiastic students to join us!
For more details, please reply “Wanted” on the QbitAI chat screen.
Qubit QbitAI· Head number signing author
վ’ᴗ’ ի Tracks new developments in AI technology and products