Wechat Android video coding crawled through those pits

Video-related development for Android has probably been one of the most divisive and incompatible parts of the Android ecosystem and Android apis. Google has very poor control over the API related to camera and video coding, resulting in many differences in the implementation of the two apis by different vendors. Moreover, the optimization of the API design has been quite limited, and some people even consider it as “one of the most difficult apis on Android”.

Taking wechat as an example, we recorded a 540p MP4 file. For Android, we generally follow the following process:

Basically, the YUV frame output from the camera is preprocessed and sent to the encoder to obtain the encoded H264 video stream.

The above is just the encoding for the video stream, and the audio stream needs to be recorded separately, and then the video stream and audio stream are synthesized to produce the final video.

This article will mainly analyze two common problems in the coding of video stream:

Choose video encoder (hard or soft)?
How to quickly preprocess (mirror, zoom, rotate) the YUV frame output from the camera?

Video encoder selection

For the demand of recording video, many apps need to process each frame of data separately, so rarely will directly use MediaRecorder to directly admit video, generally speaking, there will be so two choices

MediaCodec
FFMpeg+x264/openh264

Let’s break it down one by one

MediaCodec is a set of low-level APIS for audio and video codec released by Google after API 16. It can directly use hardware acceleration for video codec. When called, MediaCodec is initialized as the encoder of the video, and then it can directly output the encoded H264 stream by continuously passing the original YUV data into the encoder. The whole API design model includes two queues at the same time:

Therefore, as an encoder, the input queue stores the original YUV data, and the output queue outputs the encoded H264 stream, while as a decoder, the opposite is true. At the time of the call, MediaCodec provides both synchronous and asynchronous callbacks, but the asynchronous use of Callback was not added until API 21. For synchronous calls, for example, it generally looks like this (from the official example) :

 MediaCodec codec = MediaCodec.createByCodecName(name);
 codec.The configure (format,...). ;MediaFormat outputFormat = codec.getOutputFormat(); // option B
 codec.start();
 for (;;) {
   int inputBufferId = codec.dequeueInputBuffer(timeoutUs);
   if (inputBufferId > = 0) {
     ByteBuffer inputBuffer = codec.GetInputBuffer (...). ;// fill inputBuffer with valid data... codec.QueueInputBuffer (inputBufferId,...). ; }int outputBufferId = codec.DequeueOutputBuffer (...). ;if (outputBufferId > = 0) {
     ByteBuffer outputBuffer = codec.getOutputBuffer(outputBufferId);
     MediaFormat bufferFormat = codec.getOutputFormat(outputBufferId); // option A
     // bufferFormat is identical to outputFormat
     // outputBuffer is ready to be processed or rendered.... codec.ReleaseOutputBuffer (outputBufferId,...). ; }else if (outputBufferId = = MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
     // Subsequent data will conform to new format.
     // Can ignore if using getOutputFormat(outputBufferId)
     outputFormat = codec.getOutputFormat(); // option B
   }
 }
 codec.stop();
 codec.release();Copy the code

For a brief explanation, getInputBuffers is used to get the input queue, and dequeueInputBuffer is called to get the index of the idle array of the input queue. Note that dequeueOutputBuffer returns several special values that represent changes in the current codec state. Then, the original YUV data is sent to the encoder through queueInputBuffer, and the output h264 stream is also obtained through getOutputBuffers and dequeueOutputBuffer. After processing the output data, The output buffer needs to be returned to the system via releaseOutputBuffer and put back into the output queue. For a more complex use of MediaCodec, see encodedecodeTest.java in the CTS test

From the above example, MediaCodec is a very primitive API. Because the underlying layer of MediaCodec directly calls the codec capability of the mobile platform hardware, it is very fast. However, because Google has very weak control over the entire Android hardware ecosystem, this API has many problems:

Color format problem

When MediaCodec is initialized, a MediaFormat object is passed in to configure. When used as an encoder, MediaFormat is required to specify the width and height of the video, frame rate, bit rate, I frame interval and other basic information. Another important piece of information is to specify the color format of the YUV frames accepted by the encoder. This is because YUV has many different color formats according to its sampling ratio, and for Android camera, YUV frame format output by onPreviewFrame is basically NV21 format if no parameters are configured. However, Google’s design and specification of the MediaCodec API is too close to the HAL layer of Android. As a result, NV21 is not supported by all MediaCodec devices as an input format for encoders. Therefore, at the time of initialization MediaCodec, we need through the codecInfo. GetCapabilitiesForType to check machine MediaCodec realize what YUV format as the input format specific support, in general, at least in 4.4 + system, Both formats are supported on most machines:
```
MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420Planar
MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420SemiPlanarCopy the code
```
The two formats are RESPECTIVELY YUV420P and NV21. If only YUV420P is supported on the machine, the NV21 format output by the camera needs to be converted into YUV420P before it is sent to the encoder for encoding. Otherwise, the final video will be blurred or the color will be confused

This is a small hole, basically using MediaCodec video coding will encounter this problem
Encoder support features are fairly limited

If MediaCodec is used to encode H264 video streams, there are some compression rate and bitrate related video quality Settings for the H264 format, typically such as Profile(baseline, main, High), Profile Level, Bitrate mode(CBR, CQ, VBR), reasonable configuration of these parameters can enable us to obtain a higher compression rate at the same Bitrate, so as to improve the quality of video. Android also provides the corresponding API Settings. These Settings can be set to MediaFormat:
```
MediaFormat.KEY_BITRATE_MODE
MediaFormat.KEY_PROFILE
MediaFormat.KEY_LEVELCopy the code
```
The problem is that Profile, Level, and Bitrate mode Settings are not supported on most mobile phones. Even if they are set, they will not take effect. For example, if you set Profile to high, the Baseline…. will still be displayed

One possible reason for this problem, which is almost mandatory on machines under 7.0, is that Android has a profile set in the source code hierarchy:
```
// XXX
if (h264type.eProfile ! = OMX_VIDEO_AVCProfileBaseline) {
    ALOGW("Use baseline profile instead of %d for AVC recording",
        h264type.eProfile);
    h264type.eProfile = OMX_VIDEO_AVCProfileBaseline;
}Copy the code
```
Android didn’t remove Hardcode from this section until after 7.0
```
if (h264type.eProfile = = OMX_VIDEO_AVCProfileBaseline) {
    .
} else if (h264type.eProfile = = OMX_VIDEO_AVCProfileMain ||
        h264type.eProfile = = OMX_VIDEO_AVCProfileHigh) {
    .
}Copy the code
```
This problem indirectly leads to the low quality of the video encoded by MediaCodec. At the same bit rate, it is difficult to obtain the same video quality as soft coding or even iOS.
16-bit alignment requirements

As mentioned earlier, the MediaCodec API was designed to be too close to HAL, which in many Soc implementations means that buffers passed into MediaCodec are directly fed into the Soc without any pre-processing. When encoding h264 video streams, the block size of H264 is usually 16×16, so if you set the width and height of the video to a size that is not aligned with 16, such as 960×540, on some cpus, the final video will be wasted!

It’s clear that this is because the manufacturer is to implement the API, a lack of validation of the incoming data and the pre-processing, for now, huawei, samsung Soc appear this problem will be more frequent, some of the early Soc of other manufacturers also have this kind of problem, in general solution or in setting up video width is high, Uniform set to align 16 bits after the size is good.

FFMpeg+x264/openh264

In addition to using MediaCodec for coding, another popular solution is to use FFMPEG + X264 / Openh264 for soft coding, FFMPEG is used for some video frame preprocessing. Here we mainly use X264 / Openh264 as the video encoder.

X264 is basically considered to be the fastest commercial video encoder on the market today, and basically all h264 features are supported, through reasonable configuration of various parameters or can get better compression rate and coding speed, limited by space, here will not describe the parameter configuration of H264, Take a look at the tuning of x264 encoding parameters here and here.

Openh264 is another H264 encoder open-source by Cisco. The project was opened in 2013. Compared with X264, openh264 is a little younger, but since Cisco has paid the full annual patent fee for H264, it can be directly used for free for external users. Firefox has Openh264 built in directly as a codec for its video in webRTC.

However, compared with X264, Openh264 has poor support for advanced features of H264:

The value of Profile can be baseline, Level 5.2
Multi-threaded encoding supports only slice based encoding, but not frame based multi-threaded encoding

In terms of coding efficiency, Openh264 isn’t any faster than X264, but the best part is that it’s free to use.

From the above analysis, mainly depends on the benefits of speed, and the system does not need to bring in outside the library, but features support is limited, hard and make the compression rate is generally low, and for soft coding, although slow, but the compression ratio is higher, and support H264 features will be a lot more than hard coded and relatively controllable. In terms of availability, the availability of MediaCodec is basically guaranteed on 4.4+ systems, but the encoder capability of different machines will be quite different. It is recommended to choose different encoder configurations according to the machine configuration.

YUV frame preprocessing

According to the process given at the beginning, we need to perform some pre-processing on the YUV frame output by the camera before sending it into the encoder

1. The zoom

If the camera preview size is set to 1080p, the YUV frame output in onPreviewFrame is directly 1920×1080 in size. If we need to encode a video with a different size, we need to scale the YUV frame in real time during recording.

In wechat, for example, the camera previews 1080p data, which requires encoding 960×540 video.

The most common method is to use ffMPEG sws_scale function for direct scaling, the effect/performance is better generally choose SWS_FAST_BILINEAR algorithm:

mScaleYuvCtxPtr = sws_getContext(
		           srcWidth,
		           srcHeight,
		           AV_PIX_FMT_NV21,
		           dstWidth,
		           dstHeight,
		           AV_PIX_FMT_NV21,
		           SWS_FAST_BILINEAR, NULL.NULL.NULL);
sws_scale(mScaleYuvCtxPtr,
                    (const uint8_t* const *) srcAvPicture->data,
                    srcAvPicture->linesize, 0, srcHeight,
                    dstAvPicture->data, dstAvPicture->linesize);Copy the code

On the Nexus 6P, zooming directly with FFMPEG takes about 40ms+, and for our 30fps record, it takes about 30ms per frame at most. If zooming takes that much time, we’ll be able to record video at around 15fps.

Obviously, using FFMPEG directly for scaling is too slow, have to say that SWSScale is simply the dregs of FFMPEG, after comparing several commonly used calculations in the industry, we finally consider using this fast scaling algorithm:

We choose a local mean algorithm, which calculates the four pixels of the final image by four adjacent points in the two rows before and after. For each pixel of the source image, we can use Neon to directly achieve it, taking scaling Y component as an example:

	const uint8* src_next = src_ptr + src_stride;
    asm volatile (
      "1: \n"    
        "Vld4.8 {d0, d1, d2, d3}, [%0]! \n"
        "Vld4.8 {d4, d5, d6, d7}, [%1]! \n"
        "subs %3, %3, #16 \n"  // 16 processed per loop
        
        "vrhadd.u8 d0, d0, d1 \n"
        "vrhadd.u8 d4, d4, d5 \n"
        "vrhadd.u8 d0, d0, d4 \n"

        "vrhadd.u8 d2, d2, d3 \n"
        "vrhadd.u8 d6, d6, d7 \n"
        "vrhadd.u8 d2, d2, d6 \n"

        "Vst2.8 {d0, d2} [% 2]! \n"  // store odd pixels
        
        "bgt 1b \n"
      : "+r"(src_ptr).          // % 0
        "+r"(src_next)./ / %1
        "+r"(dst)./ / %2
        "+r"(dst_width) // %3
      :
      : "q0". "q1". "q2". "q3"              // Clobber List
    );Copy the code

The Neon instruction used above can only read and store 8 or 16 bits of data at a time. For extra data, you need to use the same algorithm to implement it in C.

On the Nexus 6P, each frame can be scaled in less than 5ms. For zoom quality, ffMPEG’s SWS_FAST_BILINEAR algorithm is used to compare the images scaled by the above algorithm. The peak signal to noise ratio (PSNR) is around 38-40 in most scenarios, which is good enough.

2. Rotate

On Android, the YUV frame generated by onPreviewFrame is usually rotated 90 or 270 degrees due to the camera mounting Angle. If the final video is to be shot vertically, it is generally required to rotate the YUV frame.

For the rotation algorithm, if it is pure C code, generally is O (n^2) complexity algorithm, if it is rotating yuV frame data 960×540, on nexus 6P, each frame rotation also needs 30ms+, which is obviously not acceptable.

So let’s think a little bit differently here, can we not rotate the YUV frame?

In fact, in the head of mp4 file format, we can specify a rotation matrix, specifically in the Moov.trak.tkhd box. The video player will read the matrix information when playing the video, so as to determine the rotation Angle, displacement, zoom, etc. For details, please refer to the Apple documentation

Using FFMPEG, we can easily assign this rotation Angle to the synthesized MP4 files:

char rotateStr[1024];
sprintf(rotateStr, "%d", rotate);
av_dict_set(&out_stream->metadata, "rotate", rotateStr, 0);Copy the code

So you can save a lot of spinning costs while recording, excited!

3. The mirror

When using the front camera to shoot, if the YUV frame is not processed, the video directly shot will be mirrored and flipped. The principle here is the same as looking into a mirror. The YUV frame taken from the front camera direction is exactly opposite, but sometimes the mirrored video may not meet our needs. So at this point we need to mirror the YUV frame.

However, since the camera is usually installed at 90 or 270 degrees, the original YUV frame is actually flipped horizontally. Therefore, when performing mirror flipping, you only need to exchange each row of data up and down with the middle as the central axis. Note that Y and UV should be processed separately.

asm volatile (
      "1: \n"
        "Vld4.8 {d0, d1, d2, d3}, [%2]! \n"  // load 32 from src
        "Vld4.8 {d4, d5, d6, d7}, [%3]! \n"  // load 32 from dst
        "subs %4, %4, #32 \n"  // 32 processed per loop
        "Vst4.8 {d0, d1, d2, d3}, [%1]! \n"  // store 32 to dst
        "Vst4.8 {d4, d5, d6, d7}, [%0]! \n"  // store 32 to src
        "bgt 1b \n"
      : "+r"(src).   // % 0
        "+r"(dst)./ / %1
        "+r"(srcdata)./ / %2
        "+r"(dstdata)./ / %3
        "+r"(count) // %4  // Output registers
: // Input registers
      : "cc". "memory". "q0". "q1". "q2". "q3"  // Clobber List
    );Copy the code

Again, the rest of the data is just plain C code, and on the nexus6p, the mirror can flip a frame of 1080×1920 YUV data in less than 5ms

After coding the H264 video stream, the final process is to combine the audio stream with the video stream and package it into an MP4 file. This part can be realized through the system’s MediaMuxer, MP4V2, or FFMPEG. This part is relatively simple and will not be described here

References

Lei xiaohua (leixiaohua1020) column, the famous thor blog, there are a lot of audio and video coding /ffmpeg related learning materials, essential for entry. May he rest in peace in heaven
Android MediaCodec Stuff, which contains some examples of MediaCodec code, can be used for the first time to refer to here
Coding for NEON, a series of tutorials that describes how to use common NEON directives. Neon was used for zooming. In fact, most audio and video processing can be done using Neon. For example, YUV frame processing can be optimized for zooming, rotation, and mirror flipping
Libyuv is a YUV processing library for video zooming, which is only applicable to 1080p->540p video zooming. For general purpose video zooming, you can use libYUv to compress video frames

Wechat Android video coding crawled through those pits

Video encoder selection

MediaCodec

FFMpeg+x264/openh264

Contrast between hard and soft knitting

YUV frame preprocessing

References

Related Posts

Collect the Android custom progress bar operation tutorial

Flutter solves the problem of water ripple in the BottomNavigationBar of the system

Android RecyclerView floating title