Ll-dash CMAF low latency live streaming

DASH livestreams typically have a latency of tens of seconds. For interactive livestreams, the latency is so high that they don’t move at all. To reduce the latency of a live broadcast, you usually reduce the segment length of the video.

The figure above shows the corresponding delay of video clips of different length. Although reducing the length of video clips can reduce the delay, it will also increase resource consumption and video bit rate. Even if 1-second video segment is used, the delay will be higher than LLDASH scheme introduced below.

introduce

LLDASH (Low Latency DASH) was first proposed in 2017 and a working group was established. In 2019, DVB released dvB-Dash with Low Latency specification. This report is based on DVB and Dash-If in 2017 on Low-Latency DASH. In 2020 DASH IF released low-latency Modes for DASH Specification.

DVB (Digital Video Broadcasting) is a series of international open standards for Digital TELEVISION, maintained by THE DVB Project. The DVB Project, an industry group with more than 300 members, was initiated by a joint expert group from the European Telecommunication Standardization Organization, the European Electronic Standardization Organization and the European Broadcasting Union.

DASH IF (DASH Industry Forum) Is composed of streaming media companies, such as Akamai, Google and Microsoft. The DASH IF focuses on standardizing interoperability, facilitating mPEG-DASH development, and helping it transition from specification to a real business.

So there are currently two LLDASH specifications, DVB and DASH IF. These two low latency solutions are very similar except for one difference. DASH IF was released later so there are some differences with DVB in the DASH IF specification. And both specifications are fully backward compatible with normal DASH live.

CMAF

Although the MPEG-DASH specification does not restrict content formats, the CMAF format is used in both LLDASH specifications. CMAF itself does not reduce latency. For example, HLS supports MPEG-TS and CMAF formats. If ordinary HLS live MPEG-TS sharding is replaced by CMAF sharding, this does not reduce latency. The most important function of CMAF is to unify the playback format, thus saving storage space. However, CMAF provides some tools to make low-latency DASH possible.

The principle of

LLDASH is very similar to LHLS in that it breaks a shard into small chunks that can be downloaded and cached by players using HTTP/1.1 Chunked Transfer Encoding before the shard is fully generated. Thus reducing live latency.

As shown above, an MP4 segment in normal DASH live can only be requested if it is fully encoded. LLDASH splits video clips into small chunks. The encoder can output each Chunk and pass it to the player cache for playback.

In CMAF, FTYP and MOOV boxes constitute initial segments, and each Chunk is composed of MOOF and MDAT boxes. The player will first request an initial segment, then request the latest media segment, and the server will return chunks of the segment to the player for playback.

When a player requests a pull stream, as might be shown above, a video clip is split into three chunks. When the player sends the request to the server, the video clip is not fully generated, and the server keeps the connection open, and pushes it to the player as soon as a Chunk is generated.

Specification implementation

For using the DASH IF low latency specification of MPD, should be added to the MPD @ http://www.dashif.org/guidelines/low-latency-live-v5 profiles attributes are identified, Here is an example of an MPD that conforms to the DASH IF low-latency specification.

<? The XML version = "1.0" encoding = "utf-8"? > <MPD xmlns="urn:mpeg:dash:schema:mpd:2011" availabilityStartTime="1970-01-01T00:00:00Z" id="Config part of url maybe?"  maxSegmentDuration="PT8S" minBufferTime="PT1S" minimumUpdatePeriod="P100Y" profiles="urn:mpeg:dash:profile:full:2011,http://www.dashif.org/guidelines/low-latency-live-v5" publishTime="2021-09-14T05:25:57Z" timeShiftBufferDepth="PT5M" type="dynamic" > <BaseURL> https://livesim.dashif.org/livesim/sts_1631597157/sid_a736b022/chunkdur_1/ato_7/testpic4_8s/ </BaseURL> <ServiceDescription ID ="0"> <Latency Max ="6000" min="2000" referenceId="0" target="4000" /> <PlaybackRate Max ="1.04" Min ="0.96" /> </ServiceDescription> <Period ID =" P0 "start="PT0S"> <AdaptationSet contentType="audio" lang="eng" segmentAlignment="true"> <ProducerReferenceTime id="0" presentationTime="0" type="encoder" wallClockTime="1970-01-01T00:00:00"> <UTCTiming schemeIdUri="urn:mpeg:dash:utc:http-iso:2014" value="http://time.akamai.com/?iso" /> </ProducerReferenceTime> <SegmentTemplate availabilityTimeComplete="false" AvailabilityTimeOffset = "7.000000" duration = "384000" initialization = "$RepresentationID $/ init. Mp4" media="$RepresentationID$/$Number$.m4s" startNumber="0" timescale="48000" /> <Representation audioSamplingRate="48000" Bandwidth = "36997" codecs = "mp4a. 40.2" id = "A48 mimeType" = "audio/mp4 startWithSAP" = "1" > < AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2" /> </Representation> </AdaptationSet> <AdaptationSet contentType="video" maxFrameRate="30" maxHeight="360" maxWidth="640" par="16:9" segmentAlignment="true"> <ProducerReferenceTime id="0" presentationTime="0" type="encoder" wallClockTime="1970-01-01T00:00:00"> <UTCTiming schemeIdUri="urn:mpeg:dash:utc:http-iso:2014" value="http://time.akamai.com/?iso" /> </ProducerReferenceTime> The < SegmentTemplate availabilityTimeComplete = "false" availabilityTimeOffset = "7.000000" duration = "122880" initialization="$RepresentationID$/init.mp4" media="$RepresentationID$/$Number$.m4s" startNumber="0" timescale="15360" /> <Representation bandwidth="303780" codecs="avc1.64001e" frameRate="30" height="360" id="V300" mimeType="video/mp4" sar="1:1" startWithSAP="1" width="640" /> </AdaptationSet> </Period> <UTCTiming schemeIdUri="urn:mpeg:dash:utc:http-iso:2014" value="http://time.akamai.com/?iso" /> </MPD>Copy the code

Determine if it is Chunked with low latency

DASH IF low Latency specification defines two low latency livestreaming methods. One does not use the Chunked Transfer encoding feature mentioned above, but splits the media into very short segments to reduce latency. This method can make no difference to normal DASH livestreaming. I won’t go into too much here.

Another low-latency approach is to use the HTTP Chunked Transfer Encoding feature mentioned above, where when a piece of media is not fully generated, the player can request the segment to download and cache the Chunk that has been created, rather than reporting a 404 request error. This is the low-latency mode described below.

There are two ways to determine if this is a low-latency mode.

According to theSegmentTemplate@availabilityTimeCompleteProperties. DASH IF low latency specification for ChunkedAdaptationSet, you need toavailabilityTimeCompleteProperty set tofalse, so ifavailabilityTimeComplete 为 false, the media stream is considered to be in low-latency mode.
DVB is also defined to describe low latencyEssentialProperty 和 SupplementalPropertyElement, if one of them exists, and itsschemeIdUriAttribute is equal to theurn:dvb:dash:lowlatency:critical:2019.valueAttribute is equal to thetrue, the media stream can also be considered to be in low-latency mode.

Delay and playback rate

LLDASH defines playback delay and playback rate information in the ServiceDescription element.

<ServiceDescription id="0"> <Scope schemeIdUri="urn:dvb:dash:lowlatency:scope:2019" /> <Latency target="3000" max="6000" Min ="1500" /> <PlaybackRate Max ="1.5" min="0.5" /> </ServiceDescription>Copy the code

LatencyThe element defines information about the live delay in milliseconds.Latency@targetDelay for live target.Latency@maxFor the maximum allowed delay, the player should seek directly to the delay position when the delay exceeds this value. If the current delay is less thanLatency@minThe player should slow down.
PlaybackRateThe element defines the maximum and minimum playback rates. The normal rate is1. The player will fast-play catch-up when the delay exceeds the target delay, and may slow down when the cache is too small.

DVB low latency specification also defines the Scope of elements, its schemeIdUri attribute is urn: DVB: dash: lowlatency: Scope: 2019, the dash IF the specification does not define the element.

Clock synchronization

To obtain accurate media segmentation and live latency, the LLDASH specification defines at least one UTCTiming element in the MPD for client-server clock synchronization. The UTCTiming@schemeIdUri attribute needs to be one of the following three.

urn:mpeg:dash:utc:http-xsdate:2014
urn:mpeg:dash:utc:http-iso:2014
Urn: MPEG: Dash: UTC: HTTP-NTP :2014 (not supported in browser)

The UTCTiming@value property is the server time service address.

<UTCTiming 
    schemeIdUri="urn:mpeg:dash:utc:http-xsdate:2014"
    value="https://time.example.com"
/>
Copy the code

Some older specifications use 2012, and players should also support the following schemeidURis.

urn:mpeg:dash:utc:http-xsdate:2012
urn:mpeg:dash:utc:http-iso:2012

If there is no UTCTiming element in the MPD or the clock synchronization service is not accessible, the player can degrade to using the local clock.

Media section

LLDASH defines two ways to get media segments.

SegmentTemplate@media(using the $Number$ )SegmentTemplate@duration
SegmentTemplate@media( $Time$ 和 $Number$ )SegmentTimelineThe element

Dvb-dash prefers the first approach. DASH IF does not make recommendations.

SegmentTemplate + SegmentTemplate@duration

Here is an example using the first method.

<Representation ID ="0" mimeType="video/mp4" codecs="avc1.42c028" bandwidth="6000000" width="1920" height="1080" SAR ="1:1"> <SegmentTemplate timescale="1000000" duration="2002000" availabilityTimeOffset="1.969" availabilityTimeComplete="false" initialization="1630889228/init-stream_$RepresentationID$.m4s" media="1630889228/chunk-stream_$RepresentationID$-$Number%05d$.m4s" startNumber="1" ></SegmentTemplate> </Representation>Copy the code

In the example above, we first obtain the address of the initial section by replacing $RepresentationID$in SegmentTemplate@initialization with Representation@id 1630889228 / init – stream_0 m4s. (Initialization segments may also be provided in other ways, such as using the Initialization element.)

Once you have the initial segment URL, you also need to determine the URL of the first media segment. Assume that the NOW variable is the current time after synchronization with the server clock. Then we can use the following formula to obtain the latest complete segment address matching the target delay $Number$.

targetNumber = Math.floor(
    ((NOW - MPD@availabilityStartTime - Latency@target) / 1000 - Period@start) / 
    (SegmentTemplate@duration / SegmentTemplate@timescale)
) + SegmentTemplate@startNumber
Copy the code

$Number$= $targetNumber (SegmentTemplate@media);

SegmentTemplate + SegmentTimeline

<AdaptationSet id="0" mimeType="video/mp4" width="512" height="288" par="16:9" frameRate="30" segmentAlignment="true" startWithSAP="1"> <SegmentTemplate timescale="90000" media="segment_ctvideo_cfm4s_rid$RepresentationID$_cs$Time$_w743518253_mpd.m4s" initialization="segment_ctvideo_cfm4s_rid$RepresentationID$_cinit_w743518253_mpd.m4s" > <SegmentTimeline> <S t="1614755160" d="900000"/> <S d="900000"/> <S d="900000"/> <S d="900000"/> <S d="900000"/> </SegmentTimeline> </SegmentTemplate> <Representation ID =" p0VA0BR601091 "codecs=" AVC1.42c015" SAR ="1:1" bandwidth="601091" /> </AdaptationSet>Copy the code

SegmentTimeline is used to represent the media time and duration of each media segment, replacing the SegmentTemplate@duration attribute. SegmentTimeline has a bunch of S elements, which have three main attributes: S@t, S@d, and S@r.

The property name	describe
S@t	Segment media time, equal to its previous segment if it does not exist`Sp` 的 `Sp@t + Sp@d`
S@d	Block length
S@r	The number of repetitions of segments with the same length as the segment. The default value is 0. If it is negative, it indicates that the segment repeats to the next one`S`Elements or`Period`The end of the

As with the first method, first we need to request initialization of the segment, which I won’t go into here.

To calculate the address of the last full segment that meets the target delay, you first need to calculate the target S@t value, then locate the segment in the SegmentTimeline and calculate its URL.

targetT = (NOW - MPD@availabilityStartTime - Latency@target) / 1000 -

          Period@start + (SegmentTemplate@presentationTimeOffset / SegmentTemplate@timescale)
Copy the code

After calculating the target targetT, we can iterate over the S element of SegmentTimeline, and expand it if S@r is present. Find the S element of (S@t + S@d)/SegmentTemplate@timescale > targetT. This S element is the target S element.

Then replace $Time$in SegmentTemplate@media with the S@t attribute of the target S element, $Number$is replaced with the subscript of the target S element in the expanded SegmentTimeline element plus SegmentTemplate@startNumber. This constructs the URL for the target media segment.

Resync

Resync is defined in the MPEG DASH ISO/IEC 23009-1:2020/ AMD.1 specification. It defines segmented synchronization point information through which the player can perform fast random access, but the implementation is similar to ext-X-Part in LLHLS.

It can be placed under an AdaptationSet or Representation element.

<Resync type="2" dT="1000000" dImin="0.1" dImax="0.15” marker="TRUE"/>
Copy the code

typeA value of 2 means that CMAF Chunk can be randomly accessed, and 0 means that CMAF Chunk is not guaranteed random access
dTSaid intimescaleThe time distance of the maximum random access point.
dIminRepresents the byte distance between the minimum two random pointsdImin * bandwidth
dImaxRepresents the maximum distance in bytes between two random pointsdImax * bandwidth
markerTrue means that the player can parse the CMAF box to find synchronization points

Low delay ABR algorithm

ABR (Adaptive bit Rate) is a key feature of the DASH player, allowing video to dynamically switch bit rate and playback rate over complex network conditions, rather than interrupting playback and degrading the user experience.

Under low latency, some ABR algorithms based on bandwidth estimation are not very useful. This is because when using Chunked Transfer encoding, a video segment is not fully generated, and for a 5 second video segment, an HTTP request may take 5 seconds, which is not an accurate download time. In 2020, Twitch and ACM jointly hosted ABR Algorithms for Near-second Latency, a challenge based on Adaptation Algorithms under Low Latency. First place in the competition was the L2A-LL (Learn2adapt-LowLatency) algorithm from Unified Streaming and second place was the LoL (low-on-Latency) algorithm from The National University of Singapore. Since the Twitch player is not open source, the competition is based on the Dash.js player, which currently also integrates both ABR algorithms.

conclusion

LLDASH and LHLS are very similar in that both use HTTP/1.1’s Chunked Transfer Encoding feature to reduce latency to provide 1 to 6 seconds of low latency livestreams, and can reuse existing CDN networks to support mass viewing of livestreams. However, the Chunked Transfer Encoding feature requires browser support for the FETCH API, so it will be downgraded to normal DASH live with XHR on IE.