Audio and video capture

In the whole process of audio and video processing, audio and video collection at the sending end is undoubtedly the beginning of the whole audio and video link. On both Android and IOS, there are related hardware devices — Camera and microphone — as input sources. In this chapter, we will analyze how to collect data through Camera and recording devices on Android. This chapter can be combined with the previously published article Android audio and video – MediaCodec audio and video codec to do a complete Demo.

Camera

Camera is a Camera for capturing images/videos on Android. Prior to Android SDK API21, Camera1 was only available. Camera1 has been marked Deprecated since API 21, and Google recommends using Camera2. Let’s take a look at them separately.

Camera1

Let’s take a look at some of the class diagrams for the Camera1 architecture.

The Camera class is the core class of Camera1, and there are many inner classes.

CameraInfo class expresses camera-related information such as facing and orientation.

Camera.Parameters class is Camera related parameter Settings such as setting preview Size and rotation Angle.

Camera class has apis for opening Camera, setting parameters, setting preview and so on. Let’s look at the process of using Camera API to open system Camera.

1. Release the Camera before starting it. The purpose of this step is to reset the state of the Camera and reset the Camera’s previewCallback to NULL

Call Camera release to release

Set the Camera object to null

/** * Private fun releaseCamera() {// Reset previewCallback to empty cameraInstance!! .setPreviewCallback(null) cameraInstance!! .release() cameraInstance = null }Copy the code

2. Obtain the Camera Id

/** * Get CameraId */ private fun getCurrentCameraId(): Int {val cameraInfo = cameraInfo () // cameraInfo facing for (id in 0 until Camera.getNumberOfCameras()) { Camera.getCameraInfo(id, cameraInfo) if (cameraInfo.facing == cameraFacing) { return id } } return 0 }Copy the code

3. Open the Camera to obtain the Camera object

Private fun getCameraInstance(id: Int): Camera {return try {// Call Camera open to get an instance of Camera. Open (id)} catch (e: Exception) { throw IllegalAccessError("Camera not found") } }Copy the code

4. Set Camera parameters

//[3] Set val parameters = cameraInstance!! .parameters if (parameters.supportedFocusModes.contains(Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE)) { parameters.focusMode = Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE } cameraInstance!! .parameters = parametersCopy the code

5. Set the previewDisplay

// [4] Call Camera API set preview Surface surfaceHolder? .let { cameraInstance!! .setPreviewDisplay(it) }Copy the code

6. Set the preview callback

// [5] Call Camera API to set cameraInstance!! .setPreviewCallback { data, camera -> if (data == null || camera == null) { return@setPreviewCallback } val size = camera.parameters.previewSize onPreviewFrame? .invoke(data, size.width, size.height) }Copy the code

7. Enable preview

// [6] Call Camera API to start cameraInstance!! .startPreview()Copy the code

[3] [4] [5] [6] in the above code are all completed by calling the API of Camera class.

After the above process, the Camera preview is displayed on the Surface passed in, and the onPreviewFrame(byte[] data,Camera Camera) function is called until the Camera stops. Byte [] data stores real-time YUV image data. Byte [] Data is in YUV format NV21

YUV image format

Color space

Here we will only talk about two commonly used color Spaces.

The color mode of RGBRGB should be the one we are most familiar with, which is widely used in modern electronic devices. All colors can be mixed with R, G, and B base colors.

YUV and let’s focus on YUV, this color space is not familiar to us. This is a color format in which brightness and chroma are separated.

Early TVS were black and white, meaning they only had a brightness value, or Y. After the advent of color TV, UV and two shades were added to form YUV, also known as YCbCr.

Y: Brightness is the gray level. In addition to indicating brightness signal, it also contains more green channel quantity.

U: difference between blue channel and brightness.

V: the difference between the red channel and brightness.

What are the advantages of using YUV?

The human eye is sensitive to brightness, not sensitive to chroma, so reduce part of the AMOUNT of UV data, the human eye can not perceive, so you can compress the RESOLUTION of UV, without affecting the perception of the premise, reduce the volume of the video.

RGB and YUV conversion

Y = 0.299r + 0.587g + 0.114b

U = -0.147r-0.289g + 0.436b

V = 0.615r-0.515g-0.100b

— — — — — — — — —

R = Y + 1.14V

G = Y - 0.39U - 0.58V

B is equal to Y plus 2.03u

YUV format

YUV storage is divided into two categories: planar and Packed.

  • Planar: stores all the Y first, then all the U, and finally V;

  • Packed: Y, U and V of each pixel are stored continuously and cross.

Pakced storage is rarely used, and planar storage is adopted in most videos.

For planar storage, by omitting some chroma information, that is, brightness shares some chroma information, so as to save storage space. Accordingly, Planar distinguishes the following formats: YUV444, YUV422, and YUV420.

YUV 4:4:4 sampling, each Y corresponds to a set of UV components.

YUV 4:2:2 sampling, every two Y share a set of UV components.

YUV 4:2:0 sampling, each four Y share a set of UV components.

The most commonly used is the YUV420.

YUV420 format storage is divided into two types

  • YUV420P: three-plane storage. The data is YYYYYYYYUUVV (for example, I420) or YYYYYYYYVVUU (for example, YV12).
  • YUV420SP: two-plane storage. There are two types: YYYYYYYYUVUV (such as NV12) or YYYYYYYYVUVU (such as NV21)

Camera2

After android SDK API 21, Google recommended using the Camera2 system to manage devices, which is quite different from Camera1. Again, let’s take a look at some class diagrams of the Camera2 system.

Camera2 is much more complex than Camera1, CameraManager CameraCaptureSession is the core class of Camera2 system, Camera2 introduces CameraCaptureSession to manage the shooting session.

Let’s look at a more detailed flow chart.

1. Release the Camera before starting it. The purpose of this step is to reset the state of the Camera

private fun releaseCamera() { imageReader? .close() cameraInstance? .close() captureSession? .close() imageReader = null cameraInstance = null captureSession = null }Copy the code

2. Obtain the Camera Id

Private fun getCameraId(facing: Int): String? { return cameraManager.cameraIdList.find { id -> cameraManager.getCameraCharacteristics(id).get(CameraCharacteristics.LENS_FACING) == facing } }Copy the code

3. Open the Camera

Try {// [2] open Camera, CameraDeviceCallback() is the camera device status callback cameramanager. openCamera(cameraId, CameraDeviceCallback(), NULL)} Catch (e: CameraAccessException) { Log.e(TAG, "Opening camera (ID: $cameraId) failed.")} private inner class CameraDeviceCallback: CameraDevice.StateCallback() { override fun onOpened(camera: CameraDevice) {cameraInstance = camera // [3] startCaptureSession()} Override fun onDisconnected(camera: CameraDevice) { camera.close() cameraInstance = null } override fun onError(camera: CameraDevice, error: Int) { camera.close() cameraInstance = null } }Copy the code

4. Start the shooting session

Private fun startCaptureSession() {val size = chooseOptimalSize() {val size = chooseOptimalSize() {val size = chooseOptimalSize() ImageReader.newInstance(size.width, size.height, ImageFormat.YUV_420_888, 2).apply { setOnImageAvailableListener({ reader -> val image = reader? .acquireNextImage() ? : return@setOnImageAvailableListener onPreviewFrame? .invoke(image.generateNV21Data(), image.width, image.height) image.close() }, Null)} try {if (surfaceHolder == null) { Trigger ImageRender's cameraInstance callback? .createCaptureSession( listOf(imageReader!! [4] CameraCaptureSession is an internal class for CameraCaptureSession, Is a callback to the camera session state CaptureStateCallback(), NULL)} else {cameraInstance? .createCaptureSession( listOf(imageReader!! .surface, surfaceHolder!! .surface), CaptureStateCallback(), null ) } } catch (e: CameraAccessException) { Log.e(TAG, "Failed to start camera session")}} private inner class CaptureStateCallback: CameraCaptureSession.StateCallback() { override fun onConfigureFailed(session: CameraCaptureSession) { Log.e(TAG, Override fun onConfigured(session: CameraCaptureSession) { cameraInstance ? : captureSession = session // Set CaptureRequest Builder val Builder = cameraInstance!! .createCaptureRequest(CameraDevice.TEMPLATE_PREVIEW) builder.addTarget(imageReader!! .surface) surfaceHolder? . Let {builder. AddTarget (it. Surface)} try {/ / open session session. SetRepeatingRequest (builder. The build (), null, null) } catch (e: CameraAccessException) { Log.e(TAG, "Failed to start camera preview because it couldn't access camera", e) } catch (e: IllegalStateException) { Log.e(TAG, "Failed to start camera preview.", e) } } }Copy the code

PS

ImageRender has direct access to the image data rendered on the Surface. ImageRender works by creating instances and setting up callbacks that are called when the image on the Surface associated with the ImageRender is available

We analyzed the Camera data collected above, please see the Github address at the end of the article for the complete code.

AudioRecord

So having analyzed the video, let’s move on to the audio, the recording API where we use AudioRecord, the recording process is much simpler than video, and again, let’s look at the simple class diagram.

Just a class, the API is pretty straightforward, so let’s look at the flow.

So here’s the code

Public void startRecord () {/ / open the recording mAudioRecord startRecording (); mIsRecording = true; / / open new threads polling ExecutorService ExecutorService = Executors. NewSingleThreadExecutor (); executorService.execute(new Runnable() { @Override public void run() { byte[] buffer = new byte[DEFAULT_BUFFER_SIZE_IN_BYTES]; while (mIsRecording) { int len = mAudioRecord.read(buffer, 0, DEFAULT_BUFFER_SIZE_IN_BYTES); if (len > 0) { byte[] data = new byte[len]; System.arraycopy(buffer, 0, data, 0, len); // process data}}}}); } public void stopRecord() { mIsRecording = false; mAACMediaCodecEncoder.stopEncoder(); mAudioRecord.stop(); }Copy the code

Byte [] data generated by AudioRecord is PCM audio data.

summary

In this chapter, we have introduced the native audio and video input API in detail, which will be the basis of our later blog. Once we have YUV and PCM data, we can encode it. In the next article, we will analyze MediaCodec, using MediaCodec to hardcode the native audio and video data to create Mp4.