As developers, we need a server to support the internetization of the new video industry. What open source solution can support the new outbreak of business? What are the key capabilities or requirements that the solution needs to support? This article is compiled from the content shared on LiveVideoStack by Yang Jianchang, head of The RTC server team of Ali Cloud.
By Yang Jiancheng
Organizing/LiveVideoStack
Video playback:
https://www2.tutormeetplus.com/v2/render/playback?mode=playback&token=7955f5f3e1c942fa9ae4314b991beb1c
Hello everyone, I am Yang Jiancheng from Aliyun. This sharing will introduce the key technologies and future development of Kaiyuan streaming media server in detail.
I started to work on FFmpeg streaming media in 2009, started to participate in the development of streaming media server in 2012, and started to work on open source streaming media server SRS in 2013, which has been more than seven years. In just a few years, SRS has experienced rapid growth with the explosion of live video. In 2017, AFTER I came to Aliyun, I switched to WebRTC. We can see that Live WebRTC is widely used in the whole video industry, including online office, online education, online entertainment and other industries. Audio and video has become an indispensable medium for Internet communication and information dissemination.
This sharing content will mainly focus on the birth and process of SRS, the development plan of SRS in the future, and lead you to in-depth study the value and significance of SRS.
Thanks to the improvement of China’s communication infrastructure, especially the sinking popularity of Wi-Fi and 4G networks, audio and video products and services in China’s Internet market experienced explosive growth from 2015 to 2018. At that time, consumers generally had about 1M bandwidth to watch videos, and the network environment was relatively stable.
The technology behind live broadcasting has been mature since the era of feature phones. For example, from 2010 to 2012, consumers watched live web videos mainly through Flash on PCS, because Flash can cross the mainstream PC browsers. While Flash is supported on mobile, it doesn’t work very well. Mobile terminals such as Android or iOS mainly support HLS. In the early days, Android’s support for HLS was not good, but it has improved significantly.
No matter in the era of traditional PC or the current era of mobile Internet, the main protocols used in streaming media are RTMP/FLV and Apple HLS. Streaming media players mainly include Red5, Nginx-RTMP, CRTMP, Wowza, AMS, etc. Flash Player has been disabled by default in Chrome since around 2017, and Flash will soon be phased out of the Internet.
With the development of the Internet, live broadcasting on mobile terminals has gradually emerged and become the mainstream. For example, in the current mobile office platform, Internet broadcast is mainly called Native player and uses FLV protocol. The browser side mostly adopts HLS, which together constitute a relatively mature and perfect Internet live broadcast protocol system.
With the continuous upgrading of the construction of mobile basic network, the advent of mobile terminal and the IoT/5G era, the demand for real-time communication interconnection is increasingly strong. After the ban of Flash, a more perfect alternative — H5 player, whose technical specification is MSE, emerged. The H5 player is now supported by most PC browsers and can also play FLV and other formats. MSE extension is similar to Flash in that it provides A JS interface, unpacking FLV or HLS, and then packaging it as MP4 and sending it to the MSE interface for playback. H5 is the standard solution to replace Flash. FLV, HLS, DASH, etc., can be played directly through MSE.
In addition, the other direction that everyone pursues is low delay live broadcast. The delay of general transmission protocol can reach more than ten seconds, while RTMP can reduce the delay to 3-5 seconds. TCP on the public network sometimes appears jitter, and the delay will become larger.
At present, everyone is exploring a better way to reduce the delay of live broadcast, in this aspect WebRTC is recognized as the ideal solution. Although 5G could bring lower latency, usability is more important from a communications perspective. The prevalence of 5G networks means the stability of the entire network infrastructure, with more communication devices able to meet the corresponding requirements. For example, during the epidemic period, the number of users of live streaming video increased dramatically, while the existing live streaming services did not suffer major downtime. This is mainly due to the construction of communication network infrastructure over the past decade, as well as the guarantee and progress in the whole open source environment, business, cloud computing and other fields.
At present, SRT, IoT and other development still need to face great challenges, especially now the possibility of domestic Internet is constantly enriched, the ecological environment of the live broadcast industry will tend to be better, new scenes emerge endlessly.
First, different scenarios have completely different requirements for network infrastructure and the overall business environment. Secondly, business and open source often promote each other. Business drives the continuous implementation of new open source solutions, while open source solutions also provide technical support for business. Finally, there is a deep generation gap between different industries. For example, the monitoring industry does not need APPS, and the livestreaming industry does not use proprietary agreements. We need SRT for long-distance transmission, GB28181 for Internet of Things access, and WebRTC for interaction and online communication.
We hope to have a set of open source solutions to meet the needs of low latency livestreaming in different scenarios of different industries. Nowadays, there is a trend of cloud computing convergence. Both CDN and cloud computing begin to gradually meet the needs of online live broadcasting. As developers, we need a server to support the internetization of these new video industries. What open source solutions can support this new explosion of business? What are the key capabilities or requirements that the solution needs to support?
To implement such an open source streaming media server, we need to consider a number of key constraints and capabilities.
The first is that the platform needs to be scalable, that is, elastic. Internet business can be expanded from local areas to a large area. If we use open source solutions, we need to be clearly aware of whether existing resources and experience can support such large-scale service operation if the business scale becomes larger, which requires the maintenance of many developers and the support of cloud vendors. Without the support of open source platforms and cloud vendors, we would have to build our own platform and deploy our own servers. For many enterprises, they may not have the ability and resources to do so much business, so open source solutions are critical.
The premise of open source is that cloud computing must be supported. CDN that can be seen now, including Ali Cloud and Tencent Cloud, actually support RTMP, FLV and HLS, and now also support WebRTC. On this basis, many commercial applications have been expanded and generated, with the ability of large-scale application. We build our own platform based on open source solutions and connect it to CDN, so we can properly solve the elasticity problem. The value of an open source platform would be lost without the support of cloud services.
Low latency is the second thing we need to be aware of. At present, the trend of video development is low latency. For example, TCP protocol can delay 3~5 seconds, which is not only caused by TCP protocol itself. Things like HLS slicing, player latency, and coding latency can increase the latency to 8-10 seconds or more. The latency of WebRTC communication scenarios is usually less than one second or even up to 400 milliseconds. In common voice communication scenarios, delays of more than 400 milliseconds require manual synchronization of two people’s speech.
The third point is that the service platform needs to have excellent usability. Such as Red5, nginx-rtmp, CRTMP, Wowza, AMS, Helix, etc. Another key is communication between protocols. A service may need to be based on multiple protocols. These three points are critical if the solution is to be deployed quickly.
1. Scenario
1.1 Internet Live broadcast and Link
We are familiar with the application scenarios of Live Internet broadcasting and Lianmai, and some technical details are worth our attention. For example, in terms of codec, H.264 is relatively perfect, while PC and other devices have hardware codec. Commercial codecs, such as Domestic Hongsoft, HaiVision abroad, including some radio and television industry also have their own codecs. In addition to encoding and decoding, such as push stream OBS, FFmpeg and so on are mainly integrated in the system. If the stream is directly pushed from the anchor side, there are more schemes based on OBS modification.
In terms of transmission, we need to distribute content to many audiences. In this area, the open source solutions include Nginx-RTMP and SRS, while the commercial solutions include Wowza and AMS. The commercial solutions are mostly distributed directly through the CDN network.
The solution in the player is mainly H5 player. Most of the devices will integrate the player to realize the encoding and decoding. Of course, there is also open source SDK to realize this requirement. Live broadcast link is mainly realized through the cross functions of RTC and WebRTC.
1.2 Internet real-time communication
The typical application example of Internet real-time communication is video conference, and video codec is similar to the above scenario. However, in the aspect of audio encoding and decoding, AAC is mostly used in Internet live broadcasting while Opus is used in Internet real-time communication, because Opus has a lower delay. The client includes streaming and playing mainly WebRTC framework, streaming and playing requires a server to distribute the stream to many people. At this point, you will notice that the server is completely different from the livestream server mentioned above. It supports Janus, Mediasoup, OWT and SRS, etc. A special application in the online meeting scenario is the same format. We want to realize the interconnection between phones. The open source solution for this part is FreeSWITCH, which is a huge system in itself.
1.3 Internet Media Center
As a major application scenario, Internet media Center is mainly used for content control. For example, when we need to record a video, we want the video to be watched over and over again, such as a recorded training session. In addition, some content will not be watched frequently and repeatedly, such as National Day live broadcast, football live broadcast, etc., so it is necessary to design proper control for recorded content, such as bad content identification, automatic editing, etc. The design of the media center is strongly related to the content, which needs to include transcoding, encoding, storage and other processes. The traditional scheme is to transmit the media to multiple CDNS or distribute the stream to multiple CDNS with the help of CDN. The plan itself is a waste of resources. A better plan would be to build a media center.
In terms of security, CDN also has the authentication of broadcast, such as limiting a certain number of participants and encrypting content. Token is also a method of authentication. In addition, we also need an access standard, such as GB28181 “Technical Requirements for Information Transmission, Exchange and Control of Security Video Surveillance Networking System”. Although it is a standard, it is very private, and CDN is not good to support this standard. Cloud computing CDN is more suitable for standard things, infrastructure, distribution and so on all need standards to regulate. If the access protocol is very private, then building a media center is more suitable for the enterprise. Converting content into a standard protocol and sending it to CDN or other enterprises is relatively easier to achieve Internet.
In special scenarios, such as long-distance transmission designed for transnational live broadcasting, data is mostly transmitted through a special network, or through the Internet or SRT. These particular scenarios are too business-specific to be framed by a single standard, and not large enough to be a standard.
2. Scalability — Based on Cloud or CDN
The figure above shows a cloud-based or CDN deployment, as is the case with the Demo network of SRS. It is mainly deployed in K8s, but can also be deployed in binary, including edge cluster, media center, source station, etc. The input of the stream will return the transport stream under the non-standard protocol and push the standard RTMP stream to the source. Then it is distributed along the edge CDN through standard protocols such as RTMP and FLV. If the scale is not large enough, it is directly played and distributed from the cloud machine room. Even the slicing protocol can also be distributed through NGINX. Because the data can be stacked to the CDN, the system is scalable. The protocol is mainly through RTMP, but also through CDN. CDN also supports WebRTC now, and can also be connected through RTC, but there are many private things in RTC. RTC can use CDN in the future, but it will take some time to realize.
3. Latency
3.1 Live streaming media
Regarding latency, SRS now supports WebRTC playback and push streams will be supported soon. The video above shows a clock that OBS captures running. OBS itself has a delay of about 100 milliseconds, which is reflected in the significant difference in clock indication numbers when playing the clock running screen through RTMP and WebRTC players.
3.2 Real-time streaming media system
GB28181 is tested. From the experimental results in the figure above, we can find that the delay of HIKVISION monitoring Intranet camera is 280ms, the delay of Ali Cloud server WebRTC is about 210ms, and the delay of Ali cloud server RTMP is 1100ms. We can see that WebRTC servers have lower latency than Intranet surveillance cameras, mainly because latency is not a network problem. In this scenario, the WebRTC latency is lower than that of monitoring, and the scenario download capability is available. Most of the monitoring will smuggle transmission routes, traditional programs to play normally need to install IE plug-in. However, if you want to see it from the mobile phone through the standard protocol, the mobile phone only needs to integrate the SDK directly, and the browser can also see the picture directly, so that you don’t need to install any plug-in, and the stream of each camera can be seen.
4. For example
4.1 Cloud Native
The third part is deployment. SRS supports K8S and Docker deployment, including Docker support in every new release. The figure above shows how to deploy the K8S. I won’t go over it here, but you can take a closer look at the documentation.
We used to deploy mainly through binary installation packages, but SRS has multiple image repositories to speed up code download. The repository has a small download speed and is relatively easy to compile, install and start. In fact, the deployment mode of Docker is easier. However, Docker can be deployed on any platform. For example, Docker can be fully deployed and run on Windows, and cross-compiled on ARM platform. Sometimes we need to solve a lot of problems, but if you use ARM Docker compilation there is no problem. Because the environment of Docker is unchanged, Docker uniformly solves the environment, compilation and other problems, including K8S, which can realize uninterrupted service upgrade at the time of release, and the new version can be released at the time of business peak.
4.2 Errors & Logs
The figure above shows SRS logs with process ids and ids. An ID represents a connection on a server that serves hundreds of users and processes, and the ID is used to locate the location and context log of the problem. Streaming media, unlike HTTP, has a context as a transport stream. The long time of data exchange makes the log not only one, but also everything that happens in between. In particular, RTC logs are very large. How to extract key information from the server? In fact, SRS designed a mechanism to know what each user’s log is and extract it in a timely manner.
In addition to logs, the figure above also shows error feedback in SRS. Errors refer to the mechanism of Go, because errors in Go can be wrapped, so that people can paste the corresponding log when feedback errors, so that they can know what the stack is. Typically, an error code is displayed and the developer doesn’t know what happened. But if there is a corresponding stack for the error and variables for each stack are given, the process of querying to locate the error becomes very convenient. This is not something that people tend to focus on when looking at a new open source project. But when a problem arises and you need to find the source of the problem, the stack is critical. This means we can not only identify the source of the problem, but also address it properly.
5. High performance
Performance is a fundamental requirement, and SRS typically performs about twice as well as other servers. Performance requirements are actually more demanding in RTCS because RTCS are more performance intensive.
6. The SRS
Regarding the development of SRS, SRS has been developing steadily since 2013. At the beginning, due to the relatively fixed application scenarios, the update intensity was not large. Now, as the edge Cluster of Original Cluster is supported one after another, the coverage of live scenes is becoming more and more perfect, and RTC is constantly developing. As various video industries are going online, SRS has become very active recently.
Srs-forks surpassed NginX-RTMP around 2019, and srS-Forks is projected to grow twice as fast as NginX-RTMP in the future.
Reviewing the development of SRS, v1.0 in 2013 implemented support for basic protocols such as RTMP and HLS. Later, V2.0 mainly supports FLV and mobile Internet applications, while V3.0 provides support for Original Cluster and early support for edge Cluster, which mainly deals with scenes played by many people. Original Cluster is primarily used to support streaming, such as surveillance cameras. Edge will not store streams while Original Cluster will store streams, so the existence of clusters is required. Currently, the support for live scenes is relatively perfect.
SRS in early 2020 supports SRT, which is mainly used to solve long distance transmission. It is also a comprehensive scene for live broadcasting and Internet broadcasting, such as some professional events and overseas live streaming. SRS supports GB28181, WebRTC, etc.
In the future, we need to meet a wider range of Internet live broadcast scenarios and requirements, such as supporting SFU, IoT, AI capabilities, cloud storage recording, security, MCU, SFU, AV1, SIP, etc. Hopefully, we’ll be able to basically meet all of these scenarios by 2024.
Thanks to the above partners for their outstanding contributions
The image above shows our existing online Demo, welcome to visit.