• Getting the Most from the New Multi-camera API
  • Written by Oscar Wahltinez
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: xiaxiayang
  • Proofreader: PrinceChou

This blog is a complement to our Android Developer Summit 2018 talk, and was done in collaboration with Vinit Modi, Android Camera PM, and Emilie Roberts from the partner developer team. Check out our previous articles in this series, including camera enumerations, camera shooting sessions and requests, and using multiple camera streams at once.

Multi-camera use case

Multiple cameras were introduced in the Android Pie, and since its release a few months ago, several devices that support the API have now entered the market, such as Google’s Pixel 3 and Huawei’s Mate 20 series. Many multi-camera use cases are tightly tied to specific hardware configurations; In other words, not all use cases are suitable for every device-making multi-camera capability an ideal choice for dynamic transfer of modules. Some typical use cases include:

  • Zoom: Switches between cameras depending on the clipping area or the desired focal length
  • Depth: Build depth maps using multiple cameras
  • Background blurring: Use inferred depth information to simulate narrow focal length ranges like DSLR (Digital Single-lens Reflex Camera)

Logic and physical cameras

To understand the multi-camera API, we must first understand the difference between logical and physical cameras; This concept is best illustrated by an example. For example, we can imagine a device with three rear cameras and no front camera. In this case, each of the three rear cameras is considered a physical camera. The logical camera is then a grouping of two or more of these physical cameras. The output of the logical camera can be a stream from one of the underlying physical cameras or a fusion stream from multiple underlying physical cameras simultaneously. Both approaches are handled by HAL (Hardware Abstraction Layer) of the camera.

Many phone manufacturers have also developed their own camera apps (often pre-installed on their devices). To take advantage of all the hardware’s capabilities, they sometimes use private or hidden apis, or get special processing from driver implementations that other applications don’t have privileged access to. Some devices even implement the logical camera concept by providing a fusion stream from different physical dual cameras, but again, this is only available for certain privileged applications. Usually, the frame only exposes a physical camera. Previous third-party developers on Android Pie look like this:

Camera functionality is usually only available for privileged applications

A few things have changed since Android Pie. First, it is no longer feasible to use proprietary apis in Android applications. Second, the Android framework includes support for multiple cameras, and Android has strongly recommended that phone vendors provide logical cameras for all physical cameras facing the same direction. So here’s what third-party developers should see on devices running Android Pie and above:

Developers have full access to all camera devices starting with Android P

It is worth noting that the functionality provided by the Logic Camera is entirely dependent on the OEM implementation of camera HAL. For example, like Pixel 3, one of the physical cameras is selected based on the requested focal length and clipping area to implement its logical camera.

Multi-camera API

The new API contains the following new constants, classes, and methods:

  • CameraMetadata.REQUEST_AVAILABLE_CAPABILITIES_LOGICAL_MULTI_CAMERA
  • CameraCharacteristics.getPhysicalCameraIds()
  • CameraCharacteristics.getAvailablePhysicalCameraRequestKeys()
  • CameraDevice.createCaptureSession(SessionConfiguration config)
  • CameraCharactersitics.LOGICAL_MULTI_CAMERA_SENSOR_SYNC_TYPE
  • OutputConfiguration & SessionConfiguration

Due to changes in Android CDD, the multi-camera API also meets some developer expectations. Dual-camera devices predate the Android Pie, but opening multiple cameras at once requires trial and error; The Multi-camera API on Android now gives us a set of rules that tell us when we can turn on a pair of physical cameras, as long as they are part of the same logical camera.

As mentioned above, we can expect that in most cases, new devices released using Android Pie will expose all physical cameras (except for more exotic sensor types like infrared), as well as logic cameras that are easier to use. Also, critically, we can expect that for every guaranteed valid fusion stream, one stream belonging to the logical camera can be replaced by two streams from the underlying physical camera. Let’s go through it in more detail with an example.

Use multiple streams at the same time

In our last blog post, we detailed the rules for using multiple streams in a single camera. The same rule applies to multiple cameras, but there is a caveat worth noting in this document:

For each guaranteed converged stream, the logical camera supports replacing one logical YUV_420_888 or original stream with two physical streams of the same size and format, each from a separate physical camera, provided both physical cameras support the given size and format.

In other words, each stream of type YUV or RAW can be replaced with two streams of the same type and size. For example, we could start with the camera video stream of a single-camera device with the following configuration:

  • Stream 1: YUV type,id = 0Maximum size of logic camera

Then, a device that supports multiple cameras will allow us to create a session that replaces the logical YUV stream with two physical streams:

  • Stream 1: YUV type,id = 1Maximum size of the physical camera
  • Stream 2: YUV type,id = 2Maximum size of the physical camera

The trick is, if and only if the two cameras is part of a logical grouping camera, we can replace with two equivalent flow YUV or the original flow – that is, listed in CameraCharacteristics. GetPhysicalCameraIds ().

Another thing to consider is that the guarantee provided by the framework is only the minimum required to fetch frames from multiple physical cameras simultaneously. We can expect to support additional streams on most devices, sometimes even allowing us to open multiple physical camera devices independently. Unfortunately, since this was not a hard guarantee of the framework, it required us to perform each device’s testing and tuning through trial and error.

Create sessions using multiple physical cameras

When we interact with a physical camera in a device that supports multiple cameras, we should open a CameraDevice and interact with it in a session, The session must use API CameraDevice. CreateCaptureSession (SessionConfiguration config) is created, the API available since the SDK level 28. This session parameter will then have a number of output configurations, each of which will have a set of output targets and (optionally) the required physical camera ID.

Session parameter and output configuration model

Later, when we dispatch a shoot request, the request will have an output target associated with it. The framework decides which physical (or logical) camera to send the request to based on the output target attached to the request. If the output target corresponds to one of the output targets sent as an output configuration along with the physical camera ID, the physical camera will receive and process the request.

Use a pair of physical cameras

One of the most important additions to the multi-camera API for developers is the ability to identify logical cameras and find the physical cameras behind them. Now we know that we can open more than one physical camera (again, by opening the camera and as part of the same session), and have clear confluence flow rules, we can define a function to help us identify potential can be used to replace a logical camera video streaming a physical camera:

/** * Data class DualCamera(val logicalId: String, val physicalId1: String, val physicalId2: String) fun findDualCameras(manager: CameraManager, facing: Int? = null): Array<DualCamera> {val dualCameras = ArrayList<DualCamera>() // Run through all available camera features manager.cameraidlist.map { Pair(manager.getCameraCharacteristics(it), It)}. The filter {/ / by the direction of the camera the request parameters to filter facing = = null | | it. First, the get (CameraCharacteristics. LENS_FACING) = = facing }. The filter {/ / logic camera filter it. First. Get (CameraCharacteristics. REQUEST_AVAILABLE_CAPABILITIES)!!!!! .contains( CameraCharacteristics.REQUEST_AVAILABLE_CAPABILITIES_LOGICAL_MULTI_CAMERA) }.forEach { // Physical camera in the list of all possible are valid results / / note: there may be N physical camera as part of the logic camera grouping val physicalCameras. = it first. PhysicalCameraIds. ToTypedArray ()for (idx1 in 0 until physicalCameras.size) {
            for (idx2 in (idx1 + 1) until physicalCameras.size) {
                dualCameras.add(DualCamera(
                        it.second, physicalCameras[idx1], physicalCameras[idx2]))
            }
        }
    }

    return dualCameras.toTypedArray()
}
Copy the code

The physical camera state processing is controlled by the logical camera. Therefore, to turn on our “dual camera”, we just need to turn on the logical camera corresponding to the physical camera we are interested in:

fun openDualCamera(cameraManager: CameraManager,
                   dualCamera: DualCamera,
                   executor: Executor = AsyncTask.SERIAL_EXECUTOR,
                   callback: (CameraDevice) -> Unit) {

    cameraManager.openCamera(
            dualCamera.logicalId, executor, object : CameraDevice.StateCallback() {override fun onOpened(device: CameraDevice) = callback(device) override fun onError(device: CameraDevice, error: Int) = onDisconnected(device) override fun onDisconnected(device: CameraDevice) = device.close() }) }Copy the code

Until then, there was nothing different from what we used to do to turn on any other camera, other than choose which one to turn on. Now it’s time to create a shooting session using the new session parameter API, so we can tell the framework to associate some target with a specific physical camera ID:

/** * helper class encapsulates the types that define three sets of output targets: ** 1. Outputs outputs = Triple<MutableList<Surface>? , MutableList<Surface>? , MutableList<Surface>? > fun createDualCameraSession(cameraManager: CameraManager, dualCamera: DualCamera, targets: DualCameraOutputs, executor: Executor = AsyncTask.SERIAL_EXECUTOR, callback: (CameraCaptureSession) -> Unit) {// Create three sets of output configurations: one for the logical camera, and another for the logical camera. val outputConfigsLogical = targets.first? .map { OutputConfiguration(it) } val outputConfigsPhysical1 = targets.second? .map { OutputConfiguration(it).apply {setPhysicalCameraId(dualCamera.physicalId1) } } val outputConfigsPhysical2 = targets.third? .map { OutputConfiguration(it).apply {setPhysicalCameraId(DualCamera.physicalid2)}} Val outputConfigsAll = arrayOf(outputConfigsLogical, outputConfigsPhysical1, OutputConfigsPhysical2).filternotnull ().flatmap {it} // instantiate the sessionConfiguration that can be used to create a session val sessionConfiguration = SessionConfiguration(SessionConfiguration.SESSION_REGULAR, outputConfigsAll, executor, object : CameraCaptureSession.StateCallback() {override fun onConfigured(session: CameraCaptureSession) = callback(session) override fun onConfigureFailed(session: CameraCaptureSession) = session.device.close()}) CameraCaptureSession) = session.device.close()} Executor = executor) {// Finally create session and return it via callback. CreateCaptureSession (sessionConfiguration)}}Copy the code

For now, we can refer to the documentation or previous blog posts to see which streams are supported for fusion. We just need to remember that these are for multiple streams on a single logical camera and are compatible with using the same configuration and replacing one stream with two streams from two physical cameras from the same logical camera.

With the camera session in place, all that’s left to do is send the shot request we want. Each target of the shooting request receives data from the associated physical camera, if any, or is returned to the logical camera.

Zoom sample use case

To tie all this together with one of the first use cases discussed, let’s look at how to implement a feature in our camera application so that users can switch between different physical cameras and experience different views — effectively shooting at different “zoom levels.”

Example of converting a camera to a zoom level use case (from Pixel 3 Ad)

First, we had to select a pair of physical cameras in which we wanted to allow the user to switch. For maximum effect, we can search for the pair of cameras that provide the minimum and maximum focal lengths separately. In this way, we choose one camera device that can focus at the shortest possible distance and another that can focus at the farthest possible point:

fun findShortLongCameraPair(manager: CameraManager, facing: Int? = null): DualCamera? {

    returnfindDualCameras(manager, facing).map { val characteristics1 = manager.getCameraCharacteristics(it.physicalId1) val characteristics2 = Manager. GetCameraCharacteristics (it. PhysicalId2) / / query each physical camera announced the focal length of the val focalLengths1 = characteristics1. Get ( CameraCharacteristics.LENS_INFO_AVAILABLE_FOCAL_LENGTHS) ? :floatArrayOf(0F) val focalLengths2 = characteristics2.get( CameraCharacteristics.LENS_INFO_AVAILABLE_FOCAL_LENGTHS) ? :floatVal focalLengthsDiff1 = focallengths2.max ()!! - focalLengths1.min()!! val focalLengthsDiff2 = focalLengths1.max()!! - focalLengths2.min()!! // Returns the camera ID and the difference between the minimum and maximum focal lengthsif (focalLengthsDiff1 < focalLengthsDiff2) {
            Pair(DualCamera(it.logicalId, it.physicalId1, it.physicalId2), focalLengthsDiff1)
        } else{Pair(DualCamera(it. LogicalId, it. PhysicalId2, it. PhysicalId1), focalLengthsDiff2)} Return null}.sortedby {it.second}.reversed().lastornull ()? .first }Copy the code

A reasonable architecture would have two SurfaceViews, one for each stream, exchanged during user interaction, so that only one is visible at any given time. In the following code snippet, we demonstrate how to open the logical camera, configure the camera output, create a camera session, and start two preview streams; Take advantage of the functionality defined previously:

val cameraManager: CameraManager = ... // Retrieve two output targets from activity/fragment val surface1 =... // From SurfaceView val surface2 =... // From SurfaceView val dualCamera = findShortLongCameraPair(manager)! Outputs val outputTargets = dualcamerastate (null, mutableListOf(surface1), mutableListOf(surface2)) CreateDualCameraSession (Manager, dualCamera, Targets = outputTargets) {session -> // A single request to create a target for each physical header Each target will only receive frames from its associated physical header val requestTemplate = cameraDevice. TEMPLATE_PREVIEW val captureRequest = session.device.createCaptureRequest(requestTemplate).apply { arrayOf(surface1, Surface2).foreach {addTarget(it)}}.build() Completes the session. SetRepeatingRequest (captureRequest, null, null)}Copy the code

Now all we need to do is provide the user with a UI to switch between the two interfaces, such as a button or double click on “SurfaceView”; If we wanted to be more interesting, we could try performing some form of scenario analysis and switching between the two streams automatically.

Lens distortion

All shots produce some distortion. In Android, We can use CameraCharacteristics. LENS_DISTORTION (. It replaced the now abandoned CameraCharacteristics LENS_RADIAL_DISTORTION) query to create the distortion of camera. It is reasonable to expect that for logical cameras, distortion will be minimal and our applications can use more or less the frame as they come from this camera. For physical cameras, however, we should expect potentially very different lens configurations — especially on the wide-angle lens head.

Some equipment can be CaptureRequest. DISTORTION_CORRECTION_MODE automatic distortion correction. It’s good to know that distortion correction is turned on by default on most devices. The documentation has some more details:

FAST/HIGH_QUALITY both indicate that distortion correction determined by the camera equipment will be applied. HIGH_QUALITY mode indicates that the camera device will use the highest quality correction algorithm, even if it reduces the capture rate. Fast means that camera equipment does not degrade the capture rate when applying corrections. If any correction reduces the capture rate, FAST may be the same as OFF […] Calibration only applies to processed output such as YUV, JPEG or DEPTH16 […] By default, this control enables control on devices that support this feature.

If we want to take a picture with the highest quality physical camera, then we should try to set the correction mode to HIGH_QUALITY (if available). Here’s how we should set up the shoot request:

val cameraSession: CameraCaptureSession = ... / / to use static template to build request filming val captureRequest = cameraSession. Device. CreateCaptureRequest (CameraDevice. TEMPLATE_STILL_CAPTURE) // Determine whether the device supports distortion correction. val supportsDistortionCorrection = characteristics.get( CameraCharacteristics.DISTORTION_CORRECTION_AVAILABLE_MODES)? .contains( CameraMetadata.DISTORTION_CORRECTION_MODE_HIGH_QUALITY) ? :false

if(supportsDistortionCorrection) { captureRequest.set( CaptureRequest.DISTORTION_CORRECTION_MODE, CameraMetadata. DISTORTION_CORRECTION_MODE_HIGH_QUALITY)} / / add the output target, set other request parameters... CameraSession. Capture (Capturerequest.build (),...)Copy the code

Keep in mind that setting the shooting request in this mode has a potential impact on the frame rate the camera can produce, which is why we only set the setting correction for still image shooting.

To be continued

Yo! We covered a lot of things related to the new multi-camera API:

  • Potential use cases
  • Logical camera vs physical camera
  • Overview of multi-camera API
  • Extended rules for opening multiple camera video streams
  • How do I set camera flow for a pair of physical cameras
  • The example “Zoom” use case swaps cameras
  • Correct lens distortion

Note that we haven’t covered frame synchronization and computed depth maps yet. This is a blog worthy topic.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.