Mosaics, blurred, distorted faces and objects… Poor image quality can significantly reduce the viewing experience of a favorite show or movie on Netflix. In many cases, insufficient network bandwidth or limited data flow can prevent us from providing perfect picture quality to our viewers. For this reason, the Netflix video algorithms team has been working hard to develop more efficient compression algorithms so that Netflix can provide the same or better picture quality with less bandwidth. We will also work with other Teams at Netflix to update the client application and streaming infrastructure to support new streaming technology and ensure a seamless Netflix experience on any device.
To further improve picture quality, we developed and deployed per-title coding optimization in 2015, and a year later, optimized coding for mobile video downloads. From there, the goal is to develop a framework that codes for each Shot in the video, called the Dynamic Optimizer, that allows for more fine-tuning of content in the video stream. This article describes the challenges and solutions we face in implementing this technology in a production environment, as well as how these technologies improve video quality.
Implement dynamic optimization techniques in the production environment
As detailed in this blog post, the dynamic optimizer analyzes the same video at different qualities and resolutions to provide a better compression trajectory for the entire coding process, enabling further optimization. In this process, we will use Netflix’s subjective video quality metric VMAF as the optimization target. After all, our goal is to provide the optimal video streaming quality within the audience’s perception.
One of the biggest challenges we encountered in implementing dynamic optimization in a production environment was the need to adjust the parallel coding process we used to handle the large number of coding units. First, the analysis of dynamic optimization requires coding with different resolution and quality (QP), which leads to an exponential increase in the complexity of the coding effort. Secondly, we also changed the specific coding operation. Originally, we would split the video into several minutes for coding, but now we need to split and encode the video according to the split mirror. In the original system, for example, a One-hour episode of Stranger Things could be split into 20 three-minute video blocks. But under the storyboard based code, with an average length of four seconds each, a single episode would have to deal with 900 storyboards. Assuming that each video block corresponds to a split-mirror (FIG. 1, B), the new method will increase the number of video blocks to be processed per video encoding by more than two orders of magnitude. This also increases the probability that the entire system will encounter bottlenecks related to the number of messages being passed between compute instances. To address this problem, we have implemented a number of engineering innovations, two of which are highlighted below: Collation and Checkpoint.
While the core messaging system could have been further refined to cope with the surge in messages, it was not the most flexible and favorable solution at the time. So we introduced the concept of merge sequences for the process.
Figure 1: “blocks” formed by merging sequences. (A) represents the time axis of the entire video, and the vertical dotted line represents the boundary of the two split mirrors. (B) A split-mirror becomes a block: each split-mirror is assigned a block. (C) Merge multiple storyboards into one block: accumulate an integer number of storyboards within the duration of a target block.
In a merge sequence, we arrange multiple storyboards together so that a series of continuous storyboards can form a block. Subsequently, given that we have the flexibility to determine the composition of this combined sequence, integer splints can be combined to form blocks of approximately 3 minutes in length, and the result is similar to the original video block-based coding scheme used (FIG. 1, C). These blocks can be configured to maintain approximately the same size, simplifying resource allocation when processed by compute instances that are originally provided for video blocks that are several minutes long. In each block, the compute instance can encode the mirrors independently and apply the pre-defined coding parameters respectively.
Figure 2: Checkpoint
By combining sequences, the storyboards contained in each video block are individually encoded, which gives the system the added benefit of what we call checkpoints. Previously, if a compute instance was lost (because the instances might be borrowed, but suddenly needed to be used for higher-priority tasks), the entire block would then need to be recoded. But in a split-mirror based system, each split-mirror is coded individually. Once a splitter is coded, if the instance is lost in the process of encoding other splitters in the block, the splitter that has been coded will not need to be re-coded. We created such a checkpoint system (Figure 2) to ensure that the storyboard and its metadata can be stored as soon as the coding is complete. In this way, if you need to reprocess the same block on another compute instance, you don’t need to recode from scratch, you can just start where you left off, which saves a lot of computing resources.
Compression performance
In December 2016, we used avChi-Mobile and VP9-Mobile encoders for videos for users to download. For these mobile encoders, encoding performance is further improved relative to per-title encoders through various improvements, such as longer GOP, flexible encoder Settings, and a block optimization mechanism. The video stream obtained in this way has become the high quality baseline for our use of H.264/AVC and VP9 encoders and traditional bitrate control mechanisms.
The following figure (FIG. 3) shows the improvement of compression rate when dynamic optimization is combined with split-mirror based coding. We plotted the bitrate-VMAF curves of the new optimized encoding scheme (VP9-opt and AVchi-opt) and compared them with the following cases:
- Code per video block for download (VP9-Mobile and Avchi-Mobile)
- Per video code for streaming (AVCMain)
To create the image below, we sampled thousands of videos from the video library. Each bit rate is represented by x (horizontal axis), and the highest quality encoding method (represented by VMAF score) is selected for each video with bit rate ≤ X. The average VMAF score for all videos at a particular bit rate x is then calculated as a point on each curve in the figure below. After calculating different videos of bit rate value X, 5 curves were finally obtained, corresponding to the above 5 coding methods respectively. Assuming that the network condition is stable, the following figure can represent the average VMAF of video quality that users can watch through Netflix service under different bandwidths.
Figure 3: Compression per video, per slice, and optimized for the new encoding method
Let’s look at the bit rate reduction for equal mass. To do this, we can draw a line at VMAF=80 (good quality). Then we can get the following bit rate information:
It can be seen that compared with the per-video encoding method represented by AVCMain, the optimized encoder only needs less than half of the bit rate to achieve the same quality. For VP9-opt, the same quality stream can be achieved only with a bit rate less than 1/3 that of AVCMain. Using Avchi-Mobile and VP9-Mobile also saves 17% and 30%, respectively.
We also studied the effect of the same bandwidth on visual quality. For example, the average bandwidth of a cell phone network is 250 KBPS, which can earn the VMAF score shown in the table below. Compared with AVCMain, the optimized encoder can significantly improve the video quality.
To illustrate the difference in video quality, here is a single frame from the Chef’s Table episode, each of which is a screenshot of a different encoding method at around 250 KBPS bit rate. The quality of the materials in the picture (bricks, trees, rocks, water, etc.) has improved considerably. Among them, AVCMain (FIG. 4A, VMAF=58) and AVChi-opt (FIG. 4B, VMAF=73) showed the most significant visual difference, while VP9-opt (FIG. 4C, VMAF=79) showed the sharp-looking.
Figure 4 (A) : AVCMain, 250 KBPS, VMAF=58
Figure 4 (B) : Avchi-opt, 254 KBPS, VMAF=73
Figure 4 (C) : VP9-opt, 248 KBPS, VMAF=79
We’ll use the opening scene of 13 Reasons Why as an example to show what happens when the bitrate is around 250 KBPS. When using AVCMain (Figure 5A), the text at the top of the image is almost illegible, while the VMAF score is 60; When using Avchi-opt (FIG. 5B), the picture quality was greatly improved, and the VMAF score was 74. With VP9-opt (Figure 5C), the text and shape edges became very clear and the image quality improved significantly, with a VMAF score of 81.
Figure 5 :(A) AVCMain, 260 KBPS, VMAF=60; (B) AvChi-opt, 257 KBPS, VMAF=74; (C) VP9-opt, 252 KBPS, VMAF=81
Optimize field test of encoder
The improvement of the compression effect of the optimized encoder for each video encoding mode is introduced above, so that higher quality can be obtained with the same bit rate, or the bit rate can be reduced with the same quality. But the real question remains: Will such changes actually improve the viewing experience?
Before deploying the new coding algorithm into production, we thoroughly tested the practicality of the new algorithm with A/B testing on different platforms and devices. A/B testing helped us compare the quality of experience (QoE) of A control group (using the new coding method) with that of A control group (using the old method) in A controlled way. In order to get more accurate results for the optimized approach and the original AVCMain streaming experience, our A/B tests covered A variety of devices and videos. It can also further optimize the coding algorithm and adjust the streaming engine for different platforms.
The effect of the optimized coding method was evaluated by different QoE indexes. According to the results of A/B test, we believe that users’ viewing experience can be improved as follows:
- For users with low bandwidth connections, we can provide higher quality video at the same (or even lower) bit rate.
- For users with high bandwidth connections, we can provide the same quality video at a lower bit rate.
- When the network throughput drops suddenly, the probability of rebuffering and quality degradation is greatly reduced for most users.
- Devices that support VP9 streaming can get higher-quality video at the same bit rate.
In addition, many of our customers have mobile plans with data caps. With the help of new, optimized encoding methods, these users can stream Netflix videos for longer with the same or higher quality without having to adjust their data caps. Optimized encoders can also benefit from offline downloads. For downloadable video, users can obtain significantly improved video content with the same storage capacity.
Recoding with device support
Over the past few months, we’ve generated Avchi-Opt-encoded content for Netflix’s entire video library and started streaming it on many platforms. Users are already enjoying the benefits of this technology when they watch Netflix content on iOS, Android, PS4 and Xbox One. In addition, for some very popular content, we have also provided VP9-opt streaming, which is now supported by some Android devices. We are also actively testing this new streaming method on other devices and browsers.
Whether you’re watching “Chef’s Table” on your smart TV over the fastest broadband connection, or “Jessica Jones” on your mobile device over a spotty cellular network, Netflix is committed to providing the best-in-class viewing experience possible. The optimized coding technology is a great example of how innovative research, effective cross-team collaboration, and data-driven deployment combined to deliver a better viewing experience for our customers.
Megha Manohara, Anush Moorthy, Jan De Cock, Ioannis Katsavounidis and Anne Aaron read the original article in English.
Thanks to CAI Fangfang for proofreading this article.