Abstract: There are many audio and video transmission protocols. How to choose different services? RTSP, RTMP, RTP/RTC, HLS, MSS, DASH, WEBRTC, RIST, SRT; Here we will understand various streaming media protocols from the perspective of business development, so as to help people have a clearer understanding and make a more rational judgment when choosing.

There are many audio and video transmission protocols. How to choose different services? RTSP, RTMP, RTP/RTC, HLS, MSS, DASH, WEBRTC, RIST, SRT; Here we will understand various streaming media protocols from the perspective of business development, so as to help people have a clearer understanding and make a more rational judgment when choosing.

IPTV

IPTV is a set of systems built by operators, and its main target is the digital TV of traditional radio and television. Therefore, the primary solution of this system is the problem of large-scale live broadcast, and on this basis, it also needs to support new services such as on-demand, time shift, and look back. Operators have the advantage of being able to build their own manageable networks, so live broadcasts are distributed on a large scale based on multicast technology. The main technology stack is RTP+TS over Multicast, which significantly reduces the pressure on livestream peak streaming servers. As the on-demand, time-shift and backwatch services must use unicast transmission, the technology stack selected at that time is to use RTSP for streaming media control, and use RTP+TS over UDP (a small amount based on TCP) for data transmission.

The system now serves more than 170 million users nationwide. The challenges faced by this technology stack and corresponding solutions mainly include: To solve the problem of packet loss in unicast transmission based on UDP, which would cause users to watch splintered screen or crash, we formulated a set of specifications based on THE RTSP protocol extension. The GET_PARAMETER based on RTSP extended the signaling of retransmitted data packets, which was mainly designed based on the NACK principle. Notifies the streaming media server which packet is not received. The streaming media server resends the packet based on the RTP number carried in the request.

To solve the problem of packet loss in multicast transmission. Packet loss in multicast transmission will lead to splashes or boom in live broadcast. We adopted two methods to solve this problem:

  • FEC
  • ARQ

The problem of random packet loss can be solved by adding redundant packets of equal step length through FEC technology, but the problem of burst continuous packet loss cannot be solved. In this case, ARQ technology is required. We add a RETServer to the system, and RETServer will also join the multicast group to receive the same multicast packets received by the set-top box. After detecting packet loss, the SET-top box sends a NACK packet to the Server. After receiving the request, the RET Server sends the corresponding packet to the SET-top box according to the RTP requirements of the request.

To solve the problem of quick start of the channel under multicast transmission, the time of the terminal joining the multicast group is random, which cannot guarantee that the packet received after joining the multicast group can be understood, decoded and displayed each time. Therefore, we add an FCC Server in the system to solve this problem. When the terminal starts to watch a channel, The FCC Server preferentially requests a unicast stream. The FCC Server sends i-frames and decodes packets to the terminal at a rate of 1.X times.

IPTV more screen

With the development of mobile terminals, operators hope to develop multi-screen services such as mobile phones on the basis of IPTV services, and begin to support HLS (mainstream) and MPEG DASH (some overseas operators support MultiDRM) streaming media protocols. The problem faced by this system in the streaming media protocol layer is that different screens, different DRM formats, multiple formats bring the increase of storage space doubled, especially NPVR personal recording business, the demand for storage is very large. The main solution is Just In Time Package (JITP), that is, as long as the storage of a copy of content, according to the format of the user to watch the real-time format conversion.

OTT on demand

With the rise of OTT video-on-demand platforms represented by Youtube, Netflix, iQiyi, Youku and Tencent Video, the popularity of Apple’s mobile phone and the emergence of HLS protocol, the Streaming protocol has gradually developed from HTTP download to ABR Streaming. As one of the most mainstream ABR protocols, HLS has also become the preferred video transmission protocol for each OTT video platform. The problems and solutions of this system in the streaming media protocol layer are as follows: 1. Solve the problem of mass distribution based on the Internet. CDN technology can solve this problem well, which is also the reason why OTT streaming media protocol is basically considered to be CDN friendly at the beginning of design. 2. Due to the development of business volume to a certain scale, Netflix chose third-party CDN from the beginning and went to the road of self-built CDN (Open Connect), but its technology stack is still HLS and DASH, which are friendly to CDN streaming media protocol.

OTT live

Subdivided into event (news/event/concert) live, personal (games/Internet celebrities/shows) live. The problems and solutions faced by the streaming media protocol layer to meet such live streaming services are as follows:

First, they need to solve the same problem of mass distribution of internet-based video as OTT on demand.

2. Secondly, compared with on-demand broadcasting, live broadcasting needs to consider the extra delay of live broadcasting. The first category: TV station/event (news/event/concert) live broadcasting basically does not require interaction with the audience. Therefore, they still choose THE CDN-friendly HLS and DASH protocols, but the delay will be up to 10-30s. With the emergence of CMAF format, the delay can be less than 5s according to our test data in the laboratory. The second category: personal (games/Internet celebrities/shows) live streaming. Taking domestic live streaming platforms as an example, the business model is mainly layered by rewards, so the E2E delay of live streaming must be less than 5S, otherwise the interaction between audience and anchors is very poor, which directly affects the income of live streaming platforms. The stack of choice for this type of vendor is RTMP and HTTP FLV with lower latency.

3. With the popularity of mobile phones and 4G network, some anchors also start to try to broadcast in the outdoor. As the network conditions of outdoor live broadcasting are not controllable, and RTMP is based on TCP transmission, the broadcast effect is poor in the unstable environment of OUTDOOR 4G network conditions. Therefore, the poor uplink effect of live broadcasting in weak network environment has become a problem that the live broadcasting platform needs to solve. In order to solve this problem, we turn to UDP, live transmission technology based on UDP gradually into people’s vision. The common ones are ZIXI, SRT and RIST. ZIXI is a pure commercial company, which is obviously not suitable for a large number of personal live broadcast upload such a business model of the live broadcast platform. SRT has relatively mature open source community support, so it is widely used in China. RIST is a Reliable Internet Stream Transport (RIST) protocol developed by the Video Services Forum (VSF) in early 2017. Compared to SRT, Based on UDT non-real-time streaming media technology stack construction, RIST used relatively mature RTP+RTCP technology at the beginning, and he only defined a standardized syntax, allowing real manufacturers to innovate on this basis, without affecting the interoperation. With the development in recent years, more and more scenarios supported by RIST are becoming more and more mature.

4. With the development of live broadcasting business becoming more and more prosperous, scenes such as PK of multiple anchors and mic connection between anchors and viewers appeared. These delay requirements directly changed from 5S to less than 1s, or even less than 400ms. Once UDP is introduced, it is necessary to solve the problem of video experience caused by packet loss. Common technologies include FEC and ARQ.

Real-time audio and video RTC

With the popularization of 5G network and the impact of the epidemic, people will use real-time audio and video technology in more and more scenarios, including meetings, continuous mic, audio and video calls, video calls, online education, telemedicine and so on. As these application scenarios require a delay of less than 400ms, RTP/RTCP over UDP is basically selected for audio and video data output from the perspective of technology stack selection.

If the streaming media protocol is divided into three dimensions: quality (picture quality/frame rate), smooth, delay. Live audio and video communications have a higher tolerance for low quality than live OTT upload scenarios, allowing a certain amount of quality reduction (frame rate reduction, etc.) in exchange for a smoother experience and lower latency. This difference in choice leads to the need to get through the network transmission system and audio and video codec system in the technical design, to realize the real-time dynamic adjustment of audio and video coding parameters according to the quality of network transmission, and to balance the quality, delay and smoothness of three dimensions to get the optimal solution.

A new direction for streaming media

1. New media manifestations include VR, free perspective, point cloud; The biggest difference between them and traditional video is that traditional video mainly supports the positioning of time dimension, while the new media, besides supporting the positioning of time dimension, also supports the positioning of space dimension. It is currently discussed in the MPEG standards organization and is evolving based on the MPEG DASH specification. Taking VR as an example, the FOV field of view of general human is less than 140°. The new streaming media protocol makes use of this feature to realize partial transmission according to the range of videos watched by the audience, thus reducing the demand for bandwidth, which also well solves the problem of demanding bandwidth due to the increasing amount of transmitted data. The Cloud VR products of Huawei Cloud Video already support 8K VR, free view of live broadcasting and on-demand services.

2. With the popularity of online education and online office, more and more customers have a strong demand for low-latency interactive video services with large-scale distribution. The current architecture supports either large-scale distribution or low-latency and interactive services, but cannot provide all three at the same time. We believe that the mainstream architecture in the future needs to meet these three capabilities at the same time. Huawei’s real-time audio and video service has completed the implementation of this architecture, and is committed to continuous deep development in the field of streaming media. Let’s jointly build the future of streaming media.

Click follow to learn about the fresh technologies of Huawei Cloud