The author | XingHuaiFei
Edit | Alex
Planning | Alex, package research
Year-end Inventory #008 — HDR
1. Introduction to HDR video technology
1.1 Overview of HDR technology
Under the background of 5G+AI, ultra high definition (UHD) video will achieve faster development, not only in the traditional broadcast and TV field, but also in the field of Internet video, OTT will see more and more applications online. Uhd video has improved not only in resolution and frame rate, but also in high dynamic range (HDR) and wide color gamut (WCG). Compared with traditional SDR video, HDR video has higher brightness range, wider color gamut range and deeper bit depth (10bit/12bit).
In 2020, Apple announced the support of HDR video for iPhone 12 series, which became an important milestone in the development of HDR. Apple’s use of Dolby Vision technology enables the shooting, recording and editing of HDR video content on mobile phones, greatly lowering the threshold for HDR video content creation.
HDR videos have higher contrast, especially in scenes where both light and dark objects are present in the same frame, and HDR is closer to how the human eye perceives the physical world. In addition, HDR video is more colorful, with higher color saturation and closer to the real life scene seen by human eyes. As shown below, the HDR video has better light and dark details than the SDR on the left.
1.2 Three elements of HDR video
1.2.1 High dynamic range
HDR stands for High Dynamic Range, and can display a wider range of brightness than traditional SDRS. For example, if we take a picture of the window with a mobile phone indoors during the day, we will find that if the scene inside is taken clearly, the part of the window will be overexposed. On the other hand, if we shoot the window part clearly, the interior part will be very dark. This phenomenon is the result of insufficient dynamic range. The figure below illustrates the brightest and darkest ranges that the human eye can perceive, as well as the brightness ranges defined by HDR and SDR.
The term high dynamic range originally came from the field of image, and multiple exposures were used to synthesize HDR images. In 1997, At SIGGRAPH, Paul Debevec presented a paper entitled “Recovering a High Dynamic Range Radiation Graph from A Photograph”, which described photographing the same scene with different exposure Settings. These photos with different exposures were then combined into high dynamic range images. This high dynamic range image can capture scenes from dark shadows to bright light sources or high reflective and larger dynamic range. In fact, the HDR photos taken by iPhone or Android smart phones are synthesized after multiple exposures to complement the details of the bright and dark parts, making the images closer to the effect seen by human eyes. [1]
1.2.2, wide color gamut
Wide color gamut (WCG) is the second important feature of HDR video, enabling it to represent a wider color range than current SDR video systems. SDR video uses BT.709 color space, which can only cover 35.9% of CIE 1931 color space, while BT.2020 HDR video can account for 75.8%. The images on the display will be closer to the physical world.
1.2.3 high deep
The traditional SDR video is represented by 8 bit depth, but the BT.2020 standard has been increased to 10 bit /12 bit, which makes the gray scale transition smoother and can improve the performance of details in the image, which means that the image has a richer color performance. In addition, BT.2020 has also made some corrections in Gamma correction. [2]
1.3 Photoelectric/Electro-optical Transfer function (OETF&EOTF)
The human eye’s perception of the physical world is non-linear, that is, the sensitivity of the medium brightness and dark region is much higher than that of the high brightness region, hence the Gamma curve in the SDR field. We tend to use more bits for medium or dark areas to improve the compression rate and get closer to the human eye.
In the era of HDR, the Gamma curve was no longer sufficient for maximum brightness. Based on the Contrast Sensitivity Function (CSF) of human eyes, EOTF curve is defined in SMPTE ST 2084 standard, which is the perception quantification (PQ) curve. Brightness ranges from the darkest 0.00005 nits to the brightest 10000 nits. The PQ curve was first developed by Dolby and standardized in ST 2084.
** Mixed logarithmic gamma (HLG) ** is another important HDR transmission curve developed by THE BBC and NHK. This curve is different from PQ curve. HLG defines OETF curve. Because it basically overlaps Gamma curve in low brightness area, it provides good compatibility with SDR display equipment and is widely used in broadcast and television systems. The HLG curve was first standardized in ARIB STD-B67 and later in ITU-R BT.2100.
The image above is from SONY
1.4 Tone Mapping
Tone Mapping is a technology that converts HDR images or videos into SDR images and displays them correctly on SDR display devices. At present, HDR content is not very much, HDR and SDR mixed content will exist for a long time, and the screen that can support HDR display is very scarce, to make HDR content can have a correct display, tone mapping is a frequently used technology.
1.5 HDR Metadata
HDR metadata describes key information or features in video or image processing, including static metadata and dynamic metadata.
Static metadata specifies the maximum brightness upper limit at the pixel level for the entire slide, as standardized in ST 2086. The disadvantage of static metadata is that global hue mapping must be done, there is not enough room for adjustment, and compatibility is not good.
Dynamic metadata is a good solution to this problem. Dynamic metadata is useful in two ways: it gives colorists room to show richer detail in every scene or frame than static metadata; On the other hand, through dynamic metadata, tonal mapping on the target display brightness can maximize the presentation of the author’s creative intention on the target display.
A series of dynamic metadata is defined in the SMPTE ST 2094 standard. ST 2094-10, ST 2094-20, ST 2094-30 and ST 2094-40 give the schemes of dynamic metadata and gamut conversion of Dolby, Philips, Technicolor and Samsung respectively.
1.6 HDR ecosystem
From our discussion, HDR technology is not a single point of technology, but an end-to-end ecosystem. As can be seen from the picture below, the HDR setting is required at the beginning of the shooting of the front camera, and then the post production process requires the director and the colorist to determine the style of the film. The colorist can make color adjustments on each frame, each scene, and the results will be stored as metadata. Then enter into the coding process, the acquisition of electrical signals with video coding, embedded metadata, transmission to the decoding end. Player in the decoding, extract metadata, according to the dynamic metadata dynamic mapping and color correction.
The image above is from Philips
At every step of the ecosystem, there are many core technology suppliers and manufacturers to support. Due to the technical disunity among various standards, the competition between them is also extremely fierce, forming the contradiction between open and commercial authorization, effect and compatibility.
2. Technical status of HDR
2.1 HDR video standards
HDR has a great potential market. After years of development, there are currently several competing standards such as HDR10, HLG, HDR10+ and Dolby Vision. HDR10 has the advantage of being open and free, but it lacks static metadata compatibility. HDR10+ provides higher picture quality by adding dynamic metadata. Dolby Vision has a very strong technological competitive edge, but it is a closed ecosystem with high commercial licensing fees. The table below summarizes some comparisons for each standard.
2.1.1 HDR10
HDR10 is a more basic version and an open standard that was adopted in 2014. HDR10 has gained wide acceptance due to its ease of use and license fee free. This standard describes video content that complies with the recommendations of the UHDTV REC.ITU-R BT.2020 standard. HDR10 uses PQ EOTF transmission curve, SDR display is not compatible. HDR10 uses static metadata, which consists of three main parts:
-
SMPTE ST 2086: describes requirements for display capabilities for rendering content, such as color, white spots, minimum and maximum brightness
-
MaxFALL (Maximum Frame Average Light Level) : indicates the Maximum Average brightness of any Frame in the entire video
-
MaxCLL (Maximum Content Light Level) : indicates the brightest pixel Level in a video
HDR10’s ability to display effects is limited because it cannot meet the needs of different scenes or different frames.
2.1.2 HLG
HLG, which was developed by THE BBC and Japan’s NHK in 2015, is also widely used. This standard describes what meets the requirements of BT.2020. As mentioned above, HLG is widely used in radio and television systems and has good compatibility.
2.1.3 HDR10 +
HDR10+, Samsung’s rival to Dolby Vision, adds dynamic metadata support on top of HDR10. Adjust brightness and color for each video scene or frame, support dynamic tone mapping, and backward compatible with HDR10 format.
2.1.4 Dolby * Vision
Dolby Vision is a proprietary technology that requires a license fee. The content is not particularly rich because of licensing fees. Dolby Vision is actually the first commercial version of THE HDR standard in the world, which is very competitive. It uses dynamic metadata and can support peak brightness of up to 10000 Nits. In order to enhance bitstream playback and display compatibility, many profiles are designed to support different applications.
The HDR video shot by iPhone uses Profile 8.4. You can see that the transmission curve is HLG, the color space is 10 bits, and the encoding is HEVC 10bit(4:2:0 chroma format).
2.1.5 HDR Vivid
As a domestic HDR standard promoted by CUVA alliance, HDR Vivid mainly has three core technologies: Dynamic Metadata, Tone Mapping and Saturation Adjustment.
Based on the existing HDR standard, HDR Vivid adds dynamic metadata to provide a more accurate dynamic range mapping method for display terminals, maximizing the original artistic effect of HDR content. Compared to proprietary HDR technology, HDR Vivid is a more open and universal technology standard and open solution. For industry parties, technological openness and security are more suitable for industrial deployment. [3]
Through the form of THE CUVA standard, it defines the HDR process of ultra hd video presentation processing. By October 2021, the CUVA Alliance has published a total of three parts of the standard: covering the provisions of metadata and adaptation, application guidelines and technical requirements, and test method standards; The second part includes system integration and post production. The test in Part 3 includes four subsections: display device, portable display device, player device, and player software.
The production end can add only dynamic metadata generation without modifying the existing HDR10 and HLG HDR production processes. Adopt graphic interface conforming to color matching habits to assist the production staff to manually debug HDR Vivid dynamic metadata. Provides automated dynamic metadata generation tools to support mass production of large amounts of content and reduce development costs. Support HEVC/AVS2/AVS3/VVC metadata adaptation on Codec transmission of metadata.
2.2 Analysis of competitive formats
HDR technology and system include a large number of patented technologies. What kind of technology route to choose means that there may be patent authorization issues, so the standardization of HDR technology has been the focus of contention by various manufacturers around the world. Several major technology providers form several large technology camps.
For the sake of simplification, this paper mainly analyzes from the ecological perspective closely related to Internet video, including traditional HDR solution manufacturers and network video operators, codec manufacturers and operating systems.
2.2.1 HDR solution Vendors and Network Video Carriers
Traditional end-to-end support for HDR production includes SONY, Dolby, Davinci Resolve, Apple Pro, etc. Generally speaking, it can be divided into two camps, namely HLG and PQ.
The above image is from SONY
In broadcast television, about 81 percent of operators have adopted HLG for better compatibility, while only 11 percent have adopted PQ. But in OTT or Internet video, especially on-demand video, the opposite is true, with 88 percent of video service providers choosing PQ and 12 percent choosing HLG. [4]
In the era of mobile Internet, Internet video accounts for more than 90% of network traffic, and the trend is further on the rise. Ultra hd video has also become an important part of the HDR distribution of Internet video.
2.2.2 Video coding and transcoding
Video encoding and packaging is an important technical part of HDR video, which can be divided into professional equipment suppliers and cloud video transcoding manufacturers from the type of technology suppliers. As a basic function, HDR10 has been supported by almost all manufacturers. Competition from Dolby Vision and HDR10+ is even fiercer, and both are on the rise. Both technologies provide support for dynamic metadata, which compensates for the shortcomings of HDR10, improving detail and picture quality while enhancing system compatibility downward. [4]
Cloud transcoding vendors such as AWS, BITMOVIN, MediaKind, etc., in addition to supporting the basic functions of HDR10, they are also extending to more advanced HDR10+ and even Dolby Vision, which can at least support the parsing or transparent transmission of dynamic metadata.
2.2.3 Operating system support
Microsoft’s Windows system supports HDR video games and videos from Windows 10/11. These two directions are also very large markets. Windows also requires the HDR function to be enabled in terms of Display and Codec support. It is recommended that the Display Display HDR 500 or higher be supported by the Codec extension of HEVC, VP9 or AV1. [5]
Apple’s models starting in 2018 and later will support HDR functionality, but require newer versions of the operating system (such as Big Sur). The built-in MAC XDR monitor supports Dolby Vision, HDR10, and HLG. If an external MONITOR is compatible with HDR10, the MONITOR automatically converts Dolby Vision and HLG to HDR10. [6]
3. HDR application challenges
3.1 Lack of HDR content
The content of HDR is very scarce, because the traditional HDR production process is very complicated, the production threshold is high, the video source produced by HDR technology is few, and the supply of high-quality video source of UHD is insufficient, which cannot meet the increasing demand of users for UHD.
In the current era of UGC video, especially the rapid development of short video industry has greatly reduced the threshold of video production, but the current HDR content is mainly PGC. Major editing programs include Final Cur Pro X, Adobe Premiere Pro CC, Davinci Resolve, and more.
Therefore, how to quickly produce or supplement HDR ultra HD content materials, has become a huge market opportunity. Intelligent video processing technology based on AI has broad development space. Through intelligent processing, it can repair old film and TV series materials, convert standard DEFINITION to 4K through super sub-technology, and further improve the effect through HDR technology, so as to achieve the standard program effect close to true 4K.
3.2 HDR screen is insufficient
There are still few players and screens that support real HDR playback and display. OTT screens and screen parameters of different mobile terminals are also uneven, and the effects of terminals are obviously different, making it difficult for users to obtain a consistent visual experience. Therefore, it is a problem that needs to be solved and optimized for a long time to use certain strategies to adapt tonal mapping to various devices and convert between various HDR formats and standards.
In the smartphone ecosystem, HDR decoding, display and other basic technology support is not that common. More basic is support for the HDR standard, with few phones supporting HDR10+ and even fewer supporting Dolby Vision. Apple is still ahead, with Dolby Vision on the iPhone and less on the Android ecosystem. Many vendors use pseudo-HDR content generation techniques to render videos that look like HDR on a typical phone screen.
3.3 Standard competition and ecological fragmentation of HDR
As introduced in section 2.2, currently HDR standards and formats coexist, and the compatibility between standards is also different, which is in a state of contention of a hundred schools of thought, even a war. At the same time, some technologically leading standards, such as Dolby Vision, have very high patent licensing fees, resulting in high cost of the industrial chain, which cannot be generated end-to-end and fragmented. Therefore, a unified technology is needed to support it.
HDR10 is a basic, fully open standard with good support, including the HEVC standard which was the first to support HDR10 metadata. But HDR10’s performance and compatibility issues are a short board.
HDR video technology is an end-to-end ecosystem, and several different camps have their own specifications, which are difficult to reach a consensus. To sum up, there are four common HDR solutions: HDR10, Dolby Vision, BBC/NHK, and Technicolor/Philips.
Dolby uses existing solutions and tools to present a complete HDR system, Dolby Vision, from shooting, production to distribution and display. EOTF uses its PROPOSED PQ curve based on Barten model, which supports a maximum brightness of 10000Nits. For encoding and decoding, Dolby uses a two-layer encoding scheme called Dolby Layered, which requires metadata and requires a patent fee.
I predict that the future HDR technology competition will be between Dolby Vision’s closed ecosystem and the new HDR10+ and HDR Vivid open ecosystem.
4. Development trend prediction of HDR
4.1 AI-based HDR content production
The technological development of AI in the field of video provides a new technical means for the reconstruction of video content. Ai-based super-resolution technology can improve the resolution from STANDARD DEFINITION to HD (SD to HD), or HD to 4K or even 8K, which can compensate for a lot of image details. Through ai-based Inverse Tone Mapping technology and color enhancement technology, contrast, color saturation and other aspects can be improved. The details of these enhancements need to be expressed with the high dynamic range and wide color gamut of HDR video. NTIRE 2021 holds its first HDR video image generation technology competition. [7]
According to typical application scenarios, intelligent video remaking can be divided into intelligent picture quality improvement and intelligent old film repair. Among them, intelligent old film repair can greatly improve the efficiency of traditional manual repair, while super separation and HDR can further improve the details, adjust brightness and saturation, and try to raise the level close to true 4K.
4.2 The Internet usage of HDR is on the rise
With the continuous development of HDR display technology, HDR terminal players, mobile terminals, especially smart phones, HDR content will be more and more, and people’s cognition of HDR will be further strengthened. In the next two to three years, HDR technology is expected to achieve a rapid development.
Iqiyi integrates 4K+HDR+ high frame rate + panoramic sound technology to create a higher level option than “Blue ray” called “Frame Qi Picture”, which can be viewed through VIP. HDR makes the image appear more like the real world in terms of brightness, wide color gamut and contrast. It is predicted that 4K+HDR video experience will become a basic feature of Internet video in the next 2-3 years.
4.3 HDR game application promotion
Games are an important application of HDR. HDR technology can make the rendering of the game more correct, can enhance the light and dark details more clearly, restore the real scene.
The HDR10+ Gaming standard is an extension of the HDR10+ standard announced by Samsung in October. The standard offers higher peak brightness, automatic low latency and variable refresh rates compared to HDR10, and enables output video sources to automatically adapt to display devices for better game performance and visual effects.
Microsoft chose Dolby Vision, and developers can choose to integrate Dolby Vision into their game engines or build it with the Xbox Developer Platform. Over 100 HDR games have been released using Dolby Vision and Atmos standards. Microsoft has also been working with Dolby to bring better experiences to existing HDR10 and automated HDR games. [5]
Video of games is also an important market direction, mainly including live games and cloud games. While live games are one-way, cloud games are interactive games in the form of video. Games that already support HDR effects can provide HDR video games.
4.4 HDR Vivid commercial acceleration
Since the release of THE CUVA ALLIANCE HDR Vivid standard on September 4, 2020, the industrial ecology has been basically mature, with product support in all links of the industrial chain. The content scale has exceeded 10,000 hours and continues to grow. Seven standards have been issued, covering the entire ecology from content production to transmission, decoding and display. Meanwhile, THE HDR Vivid standard is being upgraded to the national broadcasting standard. [8]
In terms of content platforms, iQiyi, Tencent Video and other manufacturers have introduced HDR Vivid standard, which has upgraded the 4K quality. Migu has also applied the standard to live Internet scenes for the first time. In terms of video transcoding and processing, Baidu Intelligent Cloud, as a manufacturer of intelligent video processing and a participant of HDR Vivid standard, will complete the construction of HDR Vivid end-to-end processing solution as soon as possible. In terms of chip and terminal support, Hays, MediaTek, Amlogic and others have taken the lead in supporting Vivid standards, and other semiconductor manufacturers are expected to follow suit.
As you can see, the HDR Vivid ecosystem is expanding, and large-scale commercial use is just around the corner.
Note:
[1] Multi-exposure HDR capture: A review [J].
En.wikipedia.org/wiki/Multi-…
[2] [High – dynamic – range_video 】
En.wikipedia.org/wiki/High-d…
[3] [HDR Vivid technical white paper] www.cuva.org.cn/ueditor/php…
[4] UHD ServiceTracker Summary
Ultrahdforum.org/uhd-service…
[5] Windows 2021 HDR Getting Started Guide
Devblogs.microsoft.com/directx/win…
[6] [Hybrik]
Professional.dolby.com/technologie…
[7] [NTIRE 2021Challenge on High Dynamic Range Imaging: Dataset, Methods and Results]
Arxiv.org/pdf/2106.01…
[8] Progress in HDR Vivid application
www.cuva.org.cn/cuva/gzjz/l…
References:
Play HDR Video on Mac
Support.apple.com/en-us/HT210…
Elemental MediaConvert and AWS Elemental Server for 4K HDR VOD Workflows
Aws.amazon.com/cn/blogs/me…
Unveils New HDR10+ GAMING Standard HDR10+ Technologies Unveils New HDR10+ GAMING Standard
Hdr10plus.org/wp-content/…
【The HDR Ecosystem Tracker (mid-2019)】
www.flatpanelshd.com/focus.php?s…
【The HDR Ecosystem Tracker (mid-2020)】
www.flatpanelshd.com/focus.php?s…
【HDR in depth】
www.elecard.com/page/articl…
UHD Service Tracker Summary
Ultrahdforum.org/uhd-service…
About the author: Xing Huaifei, doctor, baidu intelligent cloud, video cloud audio and video processing technology architect. Currently, HE is the technical leader of video cloud audio and video processing technology products. He has more than ten years of industry experience in video encoding and decoding, processing algorithms and cloud transcoding engineering architecture, and is currently committed to the performance optimization and industrialization implementation of Baidu’s “Zhigan Ultra Clear” series products.
Scan the QR code to learn more about the conference