One of the responsibilities of the WebRTC SFU is to receive and send RTCP packets. RTCP packets include different types of feedback about audio and video streams, and the most important RTCP packet is the receiver report (RR).
The RR packet is sent from the media stream receiver to the sender of the media stream. In the case of the SFU, the RR is generated by the SFU and sent concurrently to the media stream sender, and also from each stream receiver to the SFU. (See Figure 1).
Feedback sent within RR packets includes round-trip latency, jitter, and information loss calculated for network introduction.
The packet loss reported in these RR packets is important because the audio and video sent will be adjusted according to this parameter:
In the case of audio streams, data loss in the network modifies the intensity level of the OPUS codec. In cases where there is a lot of data loss, the transmitter increases the redundancy level of forward error correction (FEC) included in the audio packet. In the case of video streaming, data loss modifies the encoding and sent video bit rate. In the case of massive data loss, the sender reduces the sending bit rate to reduce possible congestion in the network. Based on this behavior, the problem is that… How does the SFU calculate packet loss reported in RR packets sent from the SFU to the sender to obtain the best user experience?
In the next section, you can find suggestions on how to handle this problem in three different types of streams: audio, video with syndication, and video without syndication.
audio
Opus FEC is sent in-band because FEC is end-to-end (cannot be added/updated in SFU) and everyone in the room will get the same level of FEC.
If we want the audio FEC to work well for participants with the weakest links, we must ensure that the worst packet losses are reported to the sender.
Therefore, packet loss reported from the SFU to the sender should be a poor packet of the receiver (Max (PL1, PL2)).
For example, if one of the receiving participants experiences 20% data loss in the downlink, the data loss reported to the sender will be 20%, even if the sender and the remaining recipients are in a perfect network state.
With a live video
With syncing, SFU is able to send different video quality to each participant, so there is no need to adapt the sending stream to any particular participant.
Therefore, the packet loss reported from the SFU to the sender should be the packet loss of the link (send-SFU), regardless of the packet loss received, for receivers 1 and 2.
For example, if the sender uplink is good, the packet loss reported to the sender will be 0 even if the receiver encounters a large number of packet loss in the downlink. This can be corrected through retransmission and bitrate adaptation done in SFU by selecting the lower resolution/layers to forward to those participants.
No broadcast video
In the absence of syndication, SFU must send the same video quality to each participant. Thus, the sender uses the weakest link to adjust the sending bit rate and send it to the participant (and/or disable some participant’s video).
This bitrate adaptive is mainly controlled by REMB RTCP data, where the SFU includes the bandwidth available in the poorer receivers. But packet loss can also have an impact on the bit rate reported in REMB packets, so we need to determine which packet loss to include in RR packets.
In this case, I think both approaches described in the previous sections work just fine. Or a)SFU reports the packet loss rate of the worst receiver, or b) estimates its bandwidth using each receiver’s packet loss and reports only the sender’s side of the RR packet loss.
Note: If you use P2P connections when there are 2 participants in the room, and switch to SFU with Simulcast when there are 3 or more participants in the room, then you should never use no-Simulcast and multiple participant cases in this video.
This article was first published on WebRTC China, sponsored and operated by Agora