The background,

Real-time Messaging Protocol is the main Protocol for live streaming. It is an application-layer proprietary Protocol designed by Adobe for providing audio and video data transmission services between Flash players and servers. RTMP protocol is a basic live streaming push and pull protocol shared by all major cloud vendors for their direct live streaming services. With the development of domestic live streaming industry and the advent of 5G era, a basic understanding of RTMP protocol is also a basic skill that our programmers must master.

This article mainly describes the basic ideas and core concepts of RTMP, and supplemented by the source analysis of livego, with you to deeply learn the core knowledge of RTMP protocol.

Ii. Features of RTMP protocol

The main features of RTMP protocol are: multiplexing, subcontracting and application layer protocols. These features are described in detail below.

2.1 Multiplexing

Multiplexing (multiplex) refers to the transmission of multiple signals through a channel at the same time, and then the signal receiver in a channel to transmit multiple signals together to form a separate complete signal information, so as to use the communication line more effectively.

In short, on a TCP connection, the Message is divided into one or more chunks. Multiple chunks of the same Message form a ChunkStream. At the receiving end, The basic idea of multiplexing is to combine chunks of the ChunkStream to restore a complete Message.

For a simple example, let’s say we need to pass a 300-byte Message. We can split it into three chunks, each of which can be divided into Chunk headers and Chunk Data. In the Chunk Header we can mark some basic information about the Chunk, such as the Chunk Stream Id and Message Type. Chunk Data is the original Message. In the figure above, the Message is divided into 128+128+44 =300, so that the Message can be fully transmitted.

The format of the Chunk Header and Chunk Data will be discussed in more detail later.

2.2 the subcontract

The second big feature of RTMP is subcontracting, which is a feature of RTMP in contrast to RTSP. With common business application layer protocols (e.g. RPC protocol) is not the same, in the case of the multimedia network transmission, the vast majority of multimedia transmission of audio and video packets are relatively large, on the reliable transport protocol TCP for large packet transmission, is likely to be blocking the connection, leads to higher priority information cannot be passed, the subcontract and transmission is in order to solve this problem appeared, The specific subcontracting format will be introduced below.

2.3 Application layer Protocols

The final feature of RTMP is the application layer protocol. By default, RTMP protocol is implemented based on the transport layer protocol TCP. However, in the official document of RTMP, only the standard data transmission format description and some specific protocol format description are given, but there is no specific official complete implementation, which gives rise to many other relevant industry implementations. For example, RTMP over UDP and other related private adaptation protocols appear, giving you more space to expand, convenient for you to solve the native RTMP exist live delay and other problems.

3. RTMP protocol parsing

As an application-layer protocol, like other proprietary transport protocols (such as RPC), RTMP has some code implementations, such as Nginx-Rtmp, LiveGo, and SRS. In this paper, livego, an open source live broadcast server based on go language, is selected to conduct the source code main process analysis, and further study the implementation of the core process of RTMP push and pull flow with you to help you have an overall understanding of the RTMP protocol.

Before the source code analysis, we will use the analogy of RPC protocol to help you have a basic understanding of the format of RTMP protocol. First, we can look at a relatively simple but practical RPC protocol format, as shown in the figure below:

We can see that this is a data transfer format used during RPC calls, and the fundamental purpose of using such a format is to solve the “sticking and unpacking” problem.

The format of RPC protocol in the figure is briefly described as follows: First, two bytes, MAGIC, are used to indicate the number of MAGIC, marking that the protocol is an identifier that can be recognized by the peer end. If the received two bytes are not 0xbabe, the packet is discarded directly. The second sign takes up one byte, low four types of said message request/response/heartbeat, such as json serialization type four said, hessian, protobuf, kyro, etc. The third status occupies one byte and represents the status bit. Then use 8 bytes to represent the called requestId. Generally use the lower 48 bits (2 ^ 48) to represent requestId. Then use the body size of 4 bytes to represent the body Content. In this way, the complete request object of the RPC Message can be quickly parsed.

By analyzing a simple RPC protocol mentioned above, we can actually find a good idea, which is to use bytes with maximum efficiency, that is, to use the smallest byte array to transmit the most data information. A small byte can carry a lot of information, after all, there are 64 different variations of a byte. If a lot of useful information can be transmitted in a single byte on a network, then we can use very limited resources to get the most out of them. The official document of RTMP appeared in 2012. Although the implementation of RTMP protocol is very complicated and even a little bloated, it was able to have more advanced ideas in 2012, which is really a model for us to learn from.

In the era of rampant WebRTC protocol, we can also see the shadow of RTMP from the design and implementation of WebRTC. The above RPC protocol can be considered as a simplified design with similar design concepts to RTMP.

3.1 Core concepts of RTMP

Before analyzing the source code of RTMP, we first explain several core concepts in the RTMP protocol, so that we can have a basic understanding of the whole protocol stack of RTMP on a macro level. In addition, during the source code analysis in the following text, we will also use the method of packet capture to help us to analyze related principles more intuitively.

First of all, as in the PREVIOUS RPC protocol format, the actual entity object transmitted by RTMP is Chunk. A Chunk consists of Chunk Header and Chunk Body, as shown in the figure below.

3.1.1 the Chunk Header

The Chunk Header is not the same as the RPC Header. The length of RTMP’s Chunk Header is not fixed. Why not? In fact, Adobe is trying to save on data transfer costs. From the example of splitting a 300-byte Message into three chunks, we can see that there is a significant drawback to multiplexing, which is that we need a Chunk Header to mark the basic information about the Chunk. This is actually an extra byte stream transfer overhead when transferring. So in order to keep the number of bytes transferred to a minimum, we need to constantly squeeze the size of the RTMP Header to make sure that the Header size is as small as possible, so as to achieve the highest transfer efficiency.

First let’s look at the Chunk Header’s Basic Header. The length of the Basic Header is not fixed. It can be 1 byte, 2 bytes, or 3 bytes, depending on the Chunk Stream Id (CSID).

The CSIDs supported by RTMP range from 2 to 65599. 0 and 1 are reserved protocol values and cannot be used by users. A Basic Header contains at least 1 byte (the lower 8 bits), and that’s how long it is, as shown in the figure below. This byte height is reserved for FMT, and the value of FMT determines the format of the Message Header, which we’ll discuss later. The lower six bits of the byte are the VALUE of the CSID. If the value of the lower six bits of the CSID is 0, the value of the real CSID is too large to be represented by six bits, and the subsequent byte is required. If the value of the lower 6 bits cSID is 1, the actual CSID value is too large to be represented by 14 bits, and the following byte is required. Thus, the length of the entire Basic Header does not appear to be fixed, but depends entirely on the value of the lower six bits of the first byte cSID.

In practice, many CSIDs are not used. In general, the Basic Header is one byte in length and the CSID ranges from 2 to 63.

The Basci Header is just one part of the Chunk Header. The authors of RTMP, who like to play around with it, designed the Chunk Header module for RTMP to be dynamically sized. In short, to save space, it’s easy to understand that there are four different lengths of the Chunk Message Header, which are determined by the FMT value mentioned earlier.

The four formats of Message Header are shown in the figure below:

When FMT is 0, Message Header takes 11 bytes (please note that the 11 bytes here does not include Basic Header length), It consists of a 1-byte message Type Id and a 4-byte Message Stream Id.

Where, timestamp is the absolute timestamp, indicating the time when the message was sent; Message Length indicates the length of the chunk body. Message Type ID indicates the message type, which will be discussed later. The Message Stream ID is the unique identifier of a message. Note that if the absolute Timestamp of the message is greater than 0xFFFFFF, the time is too large to be represented in 3 bytes. You need to use Extended Timestamp, which is 4 bytes in length. The default is between the Chunk Header and the Chunk Body.

When FMT is 1, the Message Header takes 7 bytes, which is one less Message stream ID than the previous chunk Header of 11 bytes. This is typically used for variable-length message structures.

When FMT is 2, the Message Header only takes 3 bytes, which only contains three bytes of timestamp. Compared with before, the stream ID and Message length are both less. Generally used for messages of fixed length that require correction time (e.g., audio data).

When FMT is 3, the Message Header is not included in the Chunk Header. Generally speaking, when unpacking a complete RTMP Message, the first Chunk Message is split into chunks with FMT of 0, and subsequent chunks are split into chunks with FMT of 3. This is done in such a way that the first Chunk comes with the most complete Chunk Message. The Header of the subsequent Chunk information is smaller, which makes the implementation simpler and the compression ratio better. Of course, if the first Message is sent successfully, the second Message is sent again, and the first Chunk of the second Message is set to Chunk with FMT type 1, and then the FMT of the next Chunk is set to Chunk with FMT type 3, so that messages can be separated.

3.1.2 the Chunk Body

After spending a lot of time describing the Chunk Header, let’s take a brief look at the Chunk Body. Compared to the Chunk Header, the Chunk Body is relatively simple, with less variable length control and a simpler structure. The data in this Chunk is the data that really matters to the business, and the default length is 128 bytes (which can be negotiated by using the set Chunk size command). The packet organization format is generally AMF or FLV format audio and video data (without FLV TAG header). The data composition of AMF organizational structure is shown in the figure below. The FLV format is not described in depth in this paper. If you are interested, you can read the FLV official document.

3.1.3 AMF

Action Message Format (AMF) is a binary data serialization Format similar to JSON and XML. Adobe Flash can communicate with remote servers through AMF data.

In fact, the format of AMF is very similar to the data structure of Map, that is, based on the KV key and Value pairs, a Value Value length is added in the middle. For example, if we are transferring AMF data of type number, then len of type number can be ignored. By default, the number field takes 8 bytes. We can ignore it on our side.

For example, if AMF transfers data of type 0x02 string, the default length of len is 2 bytes, because 2 bytes is sufficient to represent the maximum length of the value that follows. And so on. Of course, there are times when len and value don’t exist, so if we pass 0x05 and we pass null, we don’t need either len or value.

Some commonly used AMF types are listed below. For more information, see the official documentation.

You can use WireShark to capture packets and actually experience the specific AMF0 format.

As shown in the figure above, this is a very typical packet capture of type AMF0 string structure. At present, there are two major versions of AMF, namely AFM0 and AMF3. In current actual use scenarios, AMF0 still occupies the mainstream position. When a client sends a Chunk Data in AMF format to a server, how does the server know whether AMF0 or AMF3 is available when receiving the Data? RTMP actually uses a Message Type ID in the Chunk Header, which is equal to 20 when the message is encoded with AMF0 and 17 when the message is encoded with AMF3.

3.1.4 the Chunk & Message

First, summarize the relationship between chunks and messages in one sentence. A Message consists of multiple chunks. A Chunk with the same Chunk Stream ID is called a Chunk Stream. Compared with RPC messages, RTMP has a lot more message types. The PREVIOUS RPC message types are basically request, Response, and heartbeat, but RTMP has a variety of message types. RTMP messages are classified into three types: protocol control messages, data messages, and command messages.

** Protocol control Message: **Message Type ID = 1 to 6, mainly used for protocol control.

** data Message: **Message Type ID = 8 9

188: Audio Audio data

9: Video Video data 1

8: Metadata includes audio and video encoding, video width and higher audio and video Metadata.

Command Message (20, 17) : This type of Message has two main types: NetConnection and NetStream. The two types have multiple functions respectively. The call of the Message can be understood as a remote function call.

The general diagram is as follows, which will be described in detail later in the section of source code parsing, where the colored part is commonly used messages.

3.2 Core implementation process

Learning the network protocol is a boring process. We try to describe the core flow of RTMP protocol as vividly as possible, including handshake, connection, createStream, push flow and pull flow, by combining the original RTMP protocol with the method of capturing packets in WireShark. The basic environment for capturing packet data in this section is as follows: LiveGo as RTMP server (service port is 1935), OBS as push flow application, and VLC as pull flow application.

As an application layer protocol resolution, first of all, we should pay attention to the master process. For each RTMP server, each push flow and pull flow is a network link from the code level. For each connection, we need to carry out corresponding procedures for processing. As we can see in the liveGo source code, there is a handleConn method, which, as the name implies, is used to handle each connection, according to the main process, divided into the first part of the handshake, the second core module according to the RTMP package protocol, The Chunk header and Chunk body are parsed, and then specific processing is performed according to the parsed Chunk header and Chunk body.

As you can see from the above code block, there are two core methods: One is HandshakeServer, which deals with handshake logic; The other is the ReadMsg method, which handles the Chunk header and Chunk body information.

3.2.1 Part 1 – Handshake

The RTMP handshake is described in detail in section 5.2.5 of the original protocol, as shown below:

At first glance, this process may seem complicated. So, let’s use WireShark to capture packets to see the whole process.

The WireShark information helps us understand the RTMP package. As shown in the following figure, the handshake involves three packages. Packet no. 16 indicates that the client sends C0 and C1 messages to the server; packet No. 18 indicates that the server sends S0, S1 and S2 messages to the client; packet No. 20 indicates that the client sends C2 messages to the server. This completes the handshake between the client and server.

The WireShark handshake process is very simple, similar to the TCP three-way handshake process. Therefore, the actual packet capture process is different from that described in Section 5.2.5 of the RTMP protocol. The overall process is very concise.

Now it’s time to go back to the more complicated handshake flow chart above. In the figure, the client and server are divided into four states: uninitialized, version number sent, ACK sent, and handshake completed.

Uninitialized: there is no communication phase between client and server;

Sent version: C0 or S0 is sent.

Sent ACK: SENT C2 or S2.

Handshake complete: S2 or C2 has been received.

The RTMP protocol specification does not specify the order of C0, C1, C2 and S0, S1, S2, but does specify the following rules:

The client can send C2 only after receiving S1 from the server.

The client can send other data only after receiving S2 from the server.

The server can send S0 and S1 only after receiving C0 from the client.

The server can send S2 only after receiving C1 from the client.

The server can only send other data after receiving C2 from the client.

According to the packet capture analysis, the WireShark complies with the preceding rules. Now the question is, what are the messages C0, C1, C2, S0, S1 and S2? In fact, the RTMP protocol specification clearly defines their data format.

C0 and S0:1 byte in length, the message specifies the RTMP version number. The value ranges from 0 to 255, and we just need to know that 3 is what we need. If you are interested, you can read the original agreement.

C1 and S1:1536 bytes long, consisting of timestamp + zero value + random data, the tundic of the handshake process.

C2 and S2:1536 bytes in length, consisting of timestamp + timestamp 2+ random data return, basically echo data of C1 and S1. In general, in the implementation, S2 is equal to C1, C2 is equal to S1.

Let’s combine the livego source code to enhance our understanding of the handshake process.

So far, the simplest handshake process to this end, you can see that the whole handshake process is relatively clear, processing logic is relatively simple, but also easier to understand.

3.2.2 Part II – Information exchange

3.2.2.1 Parsing the Chunk information of RTMP

After the handshake, it’s time to start doing the connection and other related things. Before you do this information processing, you must sharpen your tools to do a good job.

First, we need to parse Chunk Header and Chunk body according to the specification of RTMP protocol, convert the byte packet data transmitted by the network into identifiable information for processing, and then process the corresponding process according to these identifiable information data. This is the key core of source code parsing, which involves a lot of knowledge. You can combine the above to see, can facilitate you to understand ReadMsg this core logic understanding.

The logic of the above code block is clear. It mainly reads each CONN connection, performs the corresponding codec, obtains a Chunk, and merges the chunks with the same ChunkStreamId again into the corresponding ChunkStream. The last full Chunk Stream is a Message.

So this is a little bit closer to the chunkstreamId that we talked about earlier in the theory section, and you can put it together, and you have in mind that it’s a CONN connection, it’s passing multiple messages, for example connect Message, CreateStreamMessage and so on. Each Message is a Chunk Stream, that is, multiple chunks with the same CSID. So the liveGo authors use a data structure like a map for storage, and the key is the CSID. The value is the Chunkstream so that all the messages sent to the RTMP server can be saved.

The concrete logical implementation of readChunk code is divided into the following sections:

1) Modification of CSID. As for the theoretical part, follow the above logic. This part is actually the processing of basic header.

2) The Chunk Header is parses according to the format value. The theory has been described in the previous section. There are two technical points to pay attention to: the first is the timestramp timestamp processing, and the second is the chunk.new(pool) line. Also need everyone to pay attention, the code comment is also written more clearly.

3) Chunk Body read processing. As mentioned in the theory section above, when FMT is 0 in the Chunk header, there is a message Length field, which controls the size of the Chunk Body. According to this field, We can easily read the Chunk body information, the overall logic is as follows.

So far, we have successfully parsed the Chunk Header, we have read the Chunk Body, notice that we have only read the Chunk Body and we have not yet parsed the Chunk Body in AMF format, the logical processing of the Chunk Body part will be described in more detail in the source code below, Now that we have parsed a connected ChunkStream, we can go back to the main process.

Client A sends xxxCmd to the ChunkStream in the form of the AMF in the Chunk Body and the typeId of the ChunkStream. The RTMP server resolves the xxxCmd command according to The typeId and AMF information and gives the response of the corresponding command.

Livego supports AMF3 and AMF0. The difference between AMF3 and AMF0 is described above, and the following code is also commented. The AMF Chunk Body data is then parsed and the result is stored in Slice format.

With the typeId and AMF parsed, it’s a natural step to process the individual commands.

The next step is to process each client command.

3.2.2.2 connection

Processing process of the Connect command: During the connection process, the client and the server will complete the confirmation of the window size, transfer block size and bandwidth size. The original text of RTMP protocol describes the connection process in detail, as shown in the figure below:

In this case, we use WireShark to capture and analyze packets:

As can be seen from packet capture, the connection process is completed with only three packets:

Package 22: Client tells server I want chunk size to be 4096;

Package 24: The client tells the server I want to connect to an application called “Live”;

Packet 26: The server responds to the client’s connection request, determines the window size, bandwidth size, and chunk size, and returns “_result” to indicate that the response is successful. This is all done through a TCP packet.

So how do clients and servers know what these packages mean? These are the rules set out by the RTMP protocol specification, which we can read about, but we can also quickly parse by looking at wrieshark. The following is the detailed resolution of package 22, we will focus on RTMP protocol resolution information.

As can be seen from the figure, RTMP Header contains Format information, Chunk Stream ID information, Timestamp information, Body size information, Message Type ID information and Messgae Stream ID information. The hexadecimal value of Type ID is 0x01, which indicates Set Chunk Size and belongs to Protocol Control Messages.

Section 5.4 of the RTMP protocol specification states that for protocol control messages, the Chunk Stream ID must be set to 2, the Message Stream ID must be set to 0, and the timestamp must be ignored. According to the information parsed by the WireShark, packet 22 complies with the RTMP specification.

Now let’s look at the detailed parsing of package 24.

Package 24 is also sent from the client, and you can see that it sets the Message Stream ID to 0 and the Message Type ID to 0x14 (20 in decimal), meaning the AMF0 command. AMF0 is RTMP Command Messages. The RTMP protocol specification does not specify the Chunk Stream ID that must be used for connection, because the Message Type ID is what counts. The server responds according to the Message Type ID. The AMF0 command sent during the connection process carries data of the Object type, and informs the server of the application name and playback address to be connected.

The following code is how LiveGo handles the client request connection.

After receiving the request from the client to connect to the application, the server needs to respond to the client, which is the content of packet 26 captured by the WireShark. The detailed content is shown in the following figure. The server does several things in a packet.

We can learn more about this process with the livego source code.

3.2.2.3 createStream

Once the connection is complete, you can create the stream. The process of creating a flow is relatively simple and requires only two packages, as shown below:

The following is the source code for LiveGo to process the client connection request:

3.2.2.4 push flow

After the stream is created, you can start to push or pull the stream, as shown in section 7.3.1 of the RTMP protocol specification. The process of connecting and creating streams is covered in detail above, but let’s focus on the Publishing Content process.

Before using LiveGo to push streams, you need to obtain the channelkey of the push stream. You can run the following command to obtain the channelKey of “movie”. The data field value of the Content in the response Content is the channelKey required to push the stream.

$ curl http://localhost:8090/control/get? room=movie
 
StatusCode        : 200
StatusDescription : OK
Content           : {"status":200."data":"rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575K LkIZ9PYk"}
RawContent        : HTTP/1.1 200 OK
                    Content-Length: 72
                    Content-Type: application/json
                    Date: Tue, 09 Feb 2021 09:19:34 GMT
 
                    {"status":200."data":"rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575K LkIZ9PYk"}
Forms             : {}
Headers           : {[Content-Length, 72], [Content-Type, application/json], [Date
                    , Tue, 09 Feb 2021 09:19:34 GMT]}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        : mshtml.HTMLDocumentClass
RawContentLength  : 72
Copy the code

Use OBS to push the movie channel called Live in the LiveGo server. The push address is: The RTMP: / / localhost: 1935 / live/rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk. Again, let’s take a look at the captured packets in WireShark.

Early push flow, the client to launch the publish requests, thirty-six, also is the contents of the package, the request need to bring in the channel, in this package is “rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk”.

The server will first check whether the channel name exists and whether the push name is in use. If it does not exist or is in use, it will reject the client’s push request. Since we have generated the channel name before the stream is pushed, the client can use it legally, so the server responds with “netstream.publish.Start” in packet 38, which tells the client to Start pushing. The client needs to send the audio and video metadata to the server before pushing the audio and video data, which is what packet 40 does. We can take a look at the details of the package. As you can see from the figure below, there is a lot of metadata information sent, including key information such as video resolution, frame rate, audio sampling rate and audio channel.

After the server is informed of the audio and video metadata, the client can start sending valid audio and video data. The server will receive the data until the client issues the FCUnpublish and deleteStream commands. The main logic of the TransStart() method of stream.go is to receive the audio and video data from the push stream client, cache the latest data packet locally, and finally send the audio and video data to each pull stream terminal. The virreader.read () method in rtmp.go is used to Read and push the customer’s single audio and video data. The related code and comments are shown below.

Attached media header information analysis part of the source code analysis.

Parse audio head

Parsing video head

3.2.2.5 pull flow

With the continuous push stream of the push stream client, the pull stream client can continue to pull the audio and video data through the server. The pull process is described in detail in section 7.2.2.1 of the RTMP protocol specification. The process of shaking hands, connecting, and creating a flow has already been covered, but let’s focus on the play command.

In the same way, we use WireShark to capture and analyze packets. The client tells the server through packet 640 that I want to play a channel called “Movie.”

Why here is called the “movie” rather than “rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk” push flow, in fact this two points to the same channel, just one for a reactor for flow, We can see this in the LiveGo source code.

The service side, after receipt of the request pull flow client play will respond “NetStream. Play. Reset”, “NetStream. Play. Start”, “NetStream. Play. PublishNotify” and audiovisual metadata. When this is done, you can continuously send audio and video data to the pull stream client. We can further understand this process with the livego source code.

The push stream data is read through chan and sent to the pull stream client.

So far the whole RTMP main process is like this, this does not involve FLV, HLS and other specific transport protocol or format conversion source instructions, that is to say, how the RTMP server received the push client audio and video package will be distributed to the pull client unchanged, and no additional processing, But now the major cloud manufacturers pull streaming terminal support HTTP – FLV, HLS and other transport protocol support, and also support audio and video recording playback on demand function, this livego is actually supported.

Due to space limitation, I will not expand the introduction here. If there is an opportunity later, we will learn and share the processing of liveGo on this logic separately.

Four, outlook

At present, live broadcasting based on RTMP protocol is the benchmark protocol for live broadcasting in China, and it is also compatible with all major cloud manufacturers. Its excellent features such as multiplexing and subcontracting are also an important reason for major manufacturers to choose it. On this basis, it is also because it is an application layer protocol, Tencent, Ali, sound network and other large cloud manufacturers, will also be the details of the protocol, the source code transformation, such as the realization of multi-channel audio and video streams mixed stream, single-channel recording and other functions.

However, RTMP also has its own disadvantages. High delay is one of the biggest problems of RTMP. In the actual production process, even in a relatively healthy network environment, the delay of RTMP will be 38s, which is quite different from the theoretical delay value of 3s given by various cloud vendors. So what are the problems with delay? We can imagine some scenarios as follows:

Online education, students ask questions, the teacher all talked about the next knowledge point, only to see the students on a question.

E-commerce live broadcast, ask baby information, anchor “look but ignore”.

After tipping, I could not hear the thanks from the anchor.

In the other people’s shouts know the ball into, you watch or live?

Especially now live had formed the industrial chain of environment, a lot of anchor is use it as a profession, a lot of anchor used live under the same network in the company, the company exports of network under the condition of limited bandwidth, RTMP and FLV format of delay will be more serious, high delay live affect the real-time interaction of the user and the host, It also hindered the landing of some special live broadcast scenes, such as live delivery, education and so on.

Here is a general solution using the RTMP protocol:

According to the actual network situation and some Settings of push stream, such as key frame interval, push stream bit rate, etc., the delay is generally around 8 seconds, which mainly comes from two major aspects:

CDN link delay, which is divided into two parts, one part is network transmission delay. There are four segments of network transmission in CDN, assuming that the delay brought by each segment of network transmission is 20ms, then these four segments of delay are 100ms; In addition, using RTMP frames as the transmission unit means that each node must receive a full frame before starting the downstream forwarding process; In order to improve the concurrency performance, CDN will optimize the packet sending strategy to some extent, which will increase part of the delay. In the case of network jitter, the delay is even more uncontrollable. In the case of reliable transmission protocols, once network jitter occurs, all subsequent transmission processes are blocked and need to wait for the retransmission of the preceding packets.

Playback side buffers, which are the main source of latency. The public network environment varies greatly. If network jitter occurs in any of the following links: push streaming, CDN transmission, and playback and reception, the playback end will be affected. To combat the jitter of the front link, the player’s general strategy is to retain a media buffer of around 6s.

Through these instructions, we can clearly know, broadcast the largest delay is pull flow side (both playback buffer) delay, so how to eliminate the phase delay, quickly is the problem to be solved major cloud vendors, this is the subsequent major cloud vendors eliminate RTMP protocol delay new products, such as tencent cloud “fast” live, Ali Cloud’s ultra-low latency RTS live broadcast and so on. In fact, WebRTC technology has been introduced into these live broadcasts. In the future, we will have the opportunity to learn related knowledge together.

5. Reference materials

1.RTMP official document

2.AMF0

3.AMF3

4.FLV official documents

5. Analyze the FLV file format

6. Livego source code

7. Manually tear the RTMP protocol

Author: Vivo Internet Server Team -Xiong Langyu