What is the hottest technology field in 2020? Make no mistake: audio and video. The strong development of telecommuting and online education in 2020 cannot be separated from the presence of audio and video. Video conferencing, online teaching, and live entertainment are typical application scenarios of audio and video.

Richer usage scenarios require us to consider how to provide more configurable capabilities, such as resolution, frame rate, bit rate, etc., to achieve a better user experience. This article will mainly share from the “resolution”.

How to achieve custom coding resolution

Let’s first look at the definition of “resolution”. Resolution: it is a parameter to measure the amount of pixel data in an image and a key indicator to measure the quality of a frame of image or video. The higher the resolution, the larger the image volume (bytes), the better the picture quality. For a YUV i420 streaming video format, 1080 p resolution, the volume of a frame for 1920 x1080x1. 5 by 8/1024/1024 material 23.73 Mbit, frame rate of 30, is the size of the 1 s 30 x23. 73 material 711.9 Mbit. It can be seen that the large amount of data requires high bit rate, so it is necessary to compress and encode the video in the actual transmission process. Therefore, the original data resolution collected by the video acquisition equipment is called the acquisition resolution, and the actual data resolution sent to the encoder is called the encoding resolution.

Whether the video is clear and proportionally appropriate will directly affect the user experience. The selection of camera resolution is limited, sometimes the resolution we want is not directly captured by the camera. Then, the ability to configure the appropriate encoding resolution for the scenario is critical. ** How to convert the captured video to the encoding resolution we want to send? ** That’s the main content we share today.

WebRTC is Google open source, powerful real-time audio and video project, most developers on the market are based on WebRTC to build real-time audio and video communication solutions. In WebRTC, each module has a good abstract decoupling treatment, which is very friendly to our secondary development. When we build real-time audio and video communication solutions, we need to understand and learn the design ideas and code modules of WebRTC, and have the ability of secondary development and expansion. In this article, we will talk about how to implement custom encoding resolution based on WebRTC Release 72.

First, let’s consider the following questions:

  • What is the Pipeline of video data from acquisition to encoding?
  • How to select the appropriate acquisition resolution according to the set coding resolution?
  • How do I get the resolution I want?

This article will also share the above three points.

Video data Pipeline

First, let’s take a look at the Pipeline of video data. That’s what’s expected of the video. That’s what’s expected of the VideoAdapter. That’s what’s expected of the VideoSource. VideoSink is the Encoder Sink and the local Preview Sink.

For video resolution, the flow is: Will want to set for the resolution of VideoCapturer, VideoCapturer choose appropriate to the resolution of the acquisition, the original resolution data after VideoAdapter calculation, is not in line with expectations after the resolution to zoom in cutting get encoding of video data, The data is then sent to the encoder for encoding.

There are two key questions:

  • Video ocapStorm Received A storm of applause.
  • How does VideoAdapter convert acquisition resolution into coding resolution?

How to choose the right acquisition resolution

Selection of acquisition resolution

WebRTC abstracts a Base class for video capture: Received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause, received a storm of applause Then use this acquisition format to call the VDM (Video Device Module) of each platform. The specific Settings are as follows:

Code from WebRTC in SRC/media/base/videocapturer. H

VideoCapturer.h
bool GetBestCaptureFormat(const VideoFormat& desired, VideoFormat* best_format);Call GetFormatDistance() to calculate the distance of each format and select the format with the smallest distance
int64_t GetFormatDistance(const VideoFormat& desired, const VideoFormat& supported);// According to the algorithm to calculate the gap between the format supported by the device and the format we want, distance is 0, which just meets our Settings
void SetSupportedFormats(const std::vector<VideoFormat>& formats);// Set the formats supported by the acquisition device, such as FPS, Resolution, NV12, I420 and MJPEG
Copy the code

According to the parameters set, sometimes GetBestCaptureFormat() cannot get the collection format that is more consistent with our Settings, because different devices have different collection capabilities. IOS, Android, PC, Mac native camera collection and external USB camera collection support different resolutions. In particular, the acquisition ability of external USB camera is uneven. Therefore, we need to tweak GetFormatDistance() a little to meet our requirements, so let’s talk about how to tweak the code to meet our requirements.

Select strategy source code analysis

GetFormatDistance() ¶ GetFormatDistance() ¶

Code from WebRTC in SRC/media/base/videocapturer cc

// Get the distance between the supported and desired formats.
int64_t VideoCapturer::GetFormatDistance(const VideoFormat& desired,
                                         const VideoFormat& supported) {
  / /... Omit some code
  // Check resolution and fps.
  int desired_width = desired.width;// Wide encoding resolution
  int desired_height = desired.height;// High resolution encoding
  int64_t delta_w = supported.width - desired_width;/ / wide difference
  
  float supported_fps = VideoFormat::IntervalToFpsFloat(supported.interval);// The frame rate supported by the device
  float delta_fps = supported_fps - VideoFormat::IntervalToFpsFloat(desired.interval);/ / frame rate
  int64_t aspect_h = desired_width
                         ? supported.width * desired_height / desired_width
                         : desired_height;// Calculate the height of the set aspect ratio. The resolution of the acquisition device can be general width > high
  int64_t delta_h = supported.height - aspect_h;/ / high
  int64_t delta_fourcc;For example, NV12 is preferred. NV12 is preferred for the same resolution and frame rate
  
  / /... Part of the code for the degrade policy is omitted because the resolution and frame rate supported by the device do not meet the configured degrade policy
  
  int64_t distance = 0;
  distance |=
      (delta_w << 28) | (delta_h << 16) | (delta_fps << 8) | delta_fourcc;

  return distance;
}
Copy the code

We focus on the parameter Distance. Distance is a concept in WebRTC. It is the difference between the set collection format and the one supported by the device calculated according to a certain algorithm strategy. The smaller the difference is, the closer the collection format supported by the device is to the one desired by the setting.

Distance consists of four parts: delta_w, delta_h, delta_FPS and delta_fourcc, among which delta_w (resolution width) has the most weight, delta_h (high resolution) the next, and delta_FPS (frame rate) the next. Delta_fourcc (pixel format) last. The problem is that the wide specific gravity is too high and the high specific gravity is too low to match the resolution that is more accurately supported.

Example:

For iPhone XS Max 800×800 FPS :10, we extract part of the distance format, the native GetFormatDistance() algorithm does not meet the requirements, we want 800×800, It can be seen from the following figure that Best is 960×540, which is not in line with expectations:

Supported NV12 192x144x10 distance 489635708928
Supported NV12 352x288x10 distance 360789835776
Supported NV12 480x360x10 distance 257721630720
Supported NV12 640x480x10 distance 128880476160
Supported NV12 960x540x10 distance 43032248320
Supported NV12 1024x768x10 distance 60179873792
Supported NV12 1280x720x10 distance 128959119360
Supported NV12 1440x1080x10 distance 171869470720
Supported NV12 1920x1080x10 distance 300812861440
Supported NV12 1920x1440x10 distance 300742082560
Supported NV12 3088x2316x10 distance 614332104704
Best NV12 960x540x10 distance 43032248320
Copy the code

Selection strategy adjustment

In order to obtain the resolution we want, according to our analysis, we need to explicitly adjust the GetFormatDisctance() algorithm to adjust the weight of resolution to the highest, the frame rate to the second, and the pixel format to the last if no pixel format is specified. Then the modification is as follows:

int64_t VideoCapturer::GetFormatDistance(const VideoFormat& desired,
const VideoFormat& supported) {
 / /... Omit some code
  // Check resolution and fps.
int desired_width = desired.width; // Wide encoding resolution
int desired_height = desired.height; // High resolution encoding
  int64_t delta_w = supported.width - desired_width;
  int64_t delta_h = supported.height - desired_height;
  int64_t delta_fps = supported.framerate() - desired.framerate(a); distance = std::abs(delta_w) + std::abs(delta_h);
  / /... Omit the degrade policy, for example, set 1080p, but the camera capture equipment supports up to 720P, need to degrade
  distance = (distance << 16 | std::abs(delta_fps) << 8 | delta_fourcc);
return distance;
}
Copy the code

Revised: Distance consists of three parts: resolution (delta_w+delta_h), frame rate (delta_fps) and pixel (delta_fourcc), among which (delta_w+ delta_H) has the highest proportion, and delta_fps follows. Delta_fourcc finally.

Example:

Take the iPhone XS Max 800×800 FPS :10 for example. We take part of the capture format Distance, GetFormatDistance() modified, we want 800×800, select Best is 1440×1080, we can scale cropping to get 800×800, Meet expectations (for resolution requirements are not particularly accurate, can adjust the downgrade strategy, select 1024×768) :

Supported NV12 192x144x10 distance 828375040
Supported NV12 352x288x10 distance 629145600
Supported NV12 480x360x10 distance 498073600
Supported NV12 640x480x10 distance 314572800
Supported NV12 960x540x10 distance 275251200
Supported NV12 1024x768x10 distance 167772160
Supported NV12 1280x720x10 distance 367001600
Supported NV12 1440x1080x10 distance 60293120
Supported NV12 1920x1080x10 distance 91750400
Supported NV12 1920x1440x10 distance 115343360
Supported NV12 3088x2316x10 distance 249298944
Best NV12 1440x1080x10 distance 60293120
Copy the code

How to achieve the acquisition resolution to coding resolution

After the video data is collected, it is processed by VideoAdapter (abstraction in WebRTC) and then distributed to the corresponding Sink (abstraction in WebRTC). We make some adjustments in VideoAdapter to calculate the parameters needed for scaling and clipping, and then scale the video data with LibYUV and clipping to the encoded resolution (in order to retain as much image information as possible, scale the video data first, and then clip the redundant pixel information when the aspect ratio is inconsistent). Here we focus on two issues:

  • Again using the above example, we want a resolution of 800×800, but we get the best acquisition resolution of 1440×1080, so how to get the set encoding resolution of 800×800 from 1440×1080 acquisition resolution?
  • As video data flows from VideoCapture to VideoSink, VideoAdapter processes it. What does VideoAdapter do?

Now let’s carry out a specific analysis of these two problems. Let’s first understand what VideoAdapter is.

VideoAdapter introduction

The VideoAdapter is described in WebRTC as follows:

VideoAdapter adapts an input video frame to an output frame based on the specified input and output formats. The adaptation includes dropping frames to reduce frame rate and scaling frames.VideoAdapter is

thread safe.

VideoAdapter is a data input and output control module, which can control and degrade the frame rate and resolution. In the Video Quality Control module (VQC), the VideoAdapter can be configured to dynamically reduce the frame rate and dynamically scale the resolution under the condition of low bandwidth and high CPU, so as to ensure the smoothness of the Video and improve the user experience.

From the SRC/media/base/videoadapter. H

VideoAdapter.h
bool AdaptFrameResolution(int in_width,
int in_height,
                            int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height);
void OnOutputFormatRequest(
const absl::optional<std::pair<int.int>>& target_aspect_ratio,
const absl::optional<int>& max_pixel_count,
const absl::optional<int>& max_fps);
void OnOutputFormatRequest(const absl::optional<VideoFormat>& format);
Copy the code

VideoAdapter source code analysis

In VideoAdapter, call AdaptFrameResolution() according to desried_format. Cropped_width, cropped_height, out_width, out_height parameters that should be scaled and clipped from the acquisition resolution to the encoding resolution can be calculated. WebRTC’s native adaptFrameResolution computs the scaling parameter based on the computed pixel area, and cannot get the exact width & height:

From the SRC/media/base/videoadapter. Cc

bool VideoAdapter::AdaptFrameResolution(int in_width,
int in_height,
                                        int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height) {
/ /... Omit some code
// Calculate how the input should be cropped.
if(! target_aspect_ratio || target_aspect_ratio->first <=0 ||
        target_aspect_ratio->second <= 0) {
      *cropped_width = in_width;
      *cropped_height = in_height;
    } else {
const float requested_aspect =
          target_aspect_ratio->first /
static_cast<float>(target_aspect_ratio->second);
      *cropped_width =
          std::min(in_width, static_cast<int>(in_height * requested_aspect));
      *cropped_height =
          std::min(in_height, static_cast<int>(in_width / requested_aspect));
    }
const Fraction scale;// VQC scaling coefficient.... Omit code
    // Calculate final output size.
    *out_width = *cropped_width / scale.denominator * scale.numerator;
    *out_height = *cropped_height / scale.denominator * scale.numerator;
 }
Copy the code

Example:

Taking iPhone XS Max 800×800 FPS :10 as an example, the coding resolution is set to 800×800 and the acquisition resolution is 1440×1080. According to the original algorithm, the new resolution obtained by calculation is 720×720, which does not meet the expectation.

VideoAdapter adjustment

The VideoAdapter is an important part of the VIDEO quality control module (VQC) to adjust the video quality. The VQC can control the frame rate and zoom resolution mainly depending on the VideoAdapter. Therefore, you need to consider the impact on the VQC.

In order to obtain the desired resolution accurately without affecting the resolution control of the VQC module, we make the following adjustments to AdaptFrameResolution() :

bool VideoAdapter::AdaptFrameResolution(int in_width,
int in_height,
                                        int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height) {
  / /... Omit some code
bool in_more =
        (static_cast<float>(in_width) / static_cast<float>(in_height)) >=
        (static_cast<float>(desired_width_) /
static_cast<float>(desired_height_));
if (in_more) {
        *cropped_height = in_height;
        *cropped_width = *cropped_height * desired_width_ / desired_height_;
    } else {
      *cropped_width = in_width;
      *cropped_height = *cropped_width * desired_height_ / desired_width_;
    }
    *out_width = desired_width_;
    *out_height = desired_height_;
    / /... Omit some code
return true;
}
Copy the code

Example:

Similarly, taking iPhone XS Max 800×800 FPS :10 as an example, the coding resolution was set to 800×800, and the acquisition resolution was 1440×1080. According to the adjusted algorithm, the coding resolution was calculated to be 800×800, which met the expectation.

conclusion

This paper mainly introduces how to implement the configuration of coding resolution based on WebRTC. When we want to modify the video coding resolution, we need to understand the whole process of video data collection, transmission, processing, coding and so on. Here we also summarize several key steps shared today. When we want to achieve custom coding resolution sending:

  • First, set the desired encoding resolution;
  • Modify Videocapstorm. Cc, select the appropriate collection resolution according to the encoding resolution;
  • Modify the VideoAdapter.cc, calculate the parameters required by the resolution zooming and clipping to the coding resolution;
  • The raw data is scaled and clipped to the coded resolution using libyuv based on the scaling and clipping parameters;
  • Then the new data is sent to the encoder encoding and sent;
  • Finally, Done.

We can make other adjustments along the same lines. This is all for this article, we will continue to share more audio and video related technology implementation, also welcome to leave a comment with us about related technology.

5G era has arrived, audio and video applications will be more and more broad, everything is promising.

The authors introduce

He Jingjing is an audio and video engineer of netease Yunxin client, responsible for the development of the cross-platform SDK of Yunxin. Previously engaged in audio and video work in the field of online education, I have a certain understanding of building real-time audio and video solutions and application scenarios, and like to study and solve complex technical problems.

More technical dry goods, welcome to pay attention to [netease Smart enterprise technology +] wechat public number