The original title of this article, “Baidu Live Broadcast Message Service Architecture Practice”, was originally shared on the “Baidu Geek Talk” public account by the Baidu APP message Center Team. In order to make the article more accessible, the layout has been optimized and the content has been redivided. The link of the original article is at the end of the article.

1, the introduction

There are two core functions of a complete live broadcast system:

1) Push and pull flow of real-time audio and video;
2) Receiving and sending of message flows in the live broadcast room (including chat messages, barrage, instructions, etc.).

This paper mainly shares the architectural design practice and evolution process of baidu Live’s message system.

** Actually: ** The chat interaction of users in the live broadcast room is more than just user chat, although it is a common IM chat message flow in form.

** In addition to user chat: ** Real-time reminders of interactive behaviors such as sending gifts, entering the site, clicking “like”, buying, recommending goods by anchors, and applying for even wheat are also sent down through messages.

** In addition: ** Special scenarios such as closing of live broadcast room and switching of live stream also depend on real-time delivery of message flow.

Therefore, the message flow in the live broadcast system can be regarded as the basic ability of real-time interaction between anchors and users in the live broadcast room and real-time control in the live broadcast room, as well as the system support. If the real-time audio and video push-pull flow is the soul of the live broadcast system, the message flow can be said to be the skeleton of the live broadcast system, and its importance is self-evident.

Then, how to build the live broadcast message system and what challenges need to be solved, we will sort out together in this paper.

2. Series of articles

This is the fourth article in a series:

Chat Technology of Live Broadcast System (I) : Practical Road of Real-time Push Technology of Millions of Online Mepai Live Live Barrage System

Chat Technology of Live Broadcast System (II) : Technical Practice of Alibaba E-commerce IM Messaging Platform in group Chat and Live Broadcast

Live Chat Technology (III) : Evolution of Message Architecture for single room of 15 million Online Live chat rooms in wechat

“Live Broadcast System Chat Technology (IV) : Evolution practice of Massive User Real-time Message System Architecture of Baidu Live” (* Article)

3, and ordinary IM group chat differences

Live chat messages are often compared to ordinary IM group chat functions.

Group chat is the more familiar instant messaging (IM) scene, in-studio chat and group chat. There are similarities, but also essential differences.

Comparing the characteristics of the two, there are the following differences between live news and IM group chat:

1) Different numbers of participants:

An IM group chat with thousands of participants is a large group. However, for large-scale live broadcast scenes with high popularity, such as National Day, military parade and Spring Festival Gala, the cumulative users of a single live broadcast room are millions or even tens of millions of people, and the number of online users at the same time can reach millions.

2) Different organizational relations:

IM users moving into and out of groups is a relatively low-frequency operation. The user set is relatively fixed, and the change frequency of incoming and outgoing users is not very high. And users in and out of the direct broadcast room, is very frequent, high heat of the single direct broadcast room every second facing tens of thousands of users in and out of the change.

3) Different duration:

When group IM chats are established, they can last for a long time, ranging from days to months. Most broadcast rooms last no more than a few hours.

4. Core technical challenges

Based on the analogy between live messaging and IM chat in the previous section, we can extract two core technical challenges for the messaging system in live broadcasting.

Challenge 1:

Maintenance of users in direct broadcast room

1) Change of tens of thousands of users in and out of a single live broadcast room per second (the actual peak value of entering the live broadcast room does not exceed 20,000 QPS, and the exit value does not exceed 20,000 QPS);
2) Millions of simultaneous online users in a single live broadcast room;
3) The cumulative users of single broadcast room reach tens of millions of magnitude.

Support online million, accumulated ten million two sets, 40 thousand QPS updates per second, there is a certain pressure, but there is support for high read and write performance of storage should be able to solve, such as Redis.

Challenge 2:

Millions of online users

In the face of millions of online users, there is a large amount of news up and down, which is analyzed from the perspective of live broadcast users:

1) real-time performance of ** messages: ** ** If the message server performs simple peak elimination, the accumulation of peak messages will cause an increase in the overall message delay, and the delay may have a great cumulative effect. There will be a great deviation between the message and the live video stream on the time line, affecting the real-time performance of the user's interaction when watching the live broadcast;Copy the code

2) Experience and performance of ** terminal: ** ** terminal displays all kinds of user chats and system messages, generally no more than 10-20 messages per screen. If more than 20 messages are sent per second, the messages displayed on the terminal will continue to flood the screen. Taking into account the special effects such as the gift message, a large number of messages, the processing and display of the other side, bring a continuous high load. Therefore, for a client that watches live broadcast for a long time, if there is a continuous large number of messages, the message consumption of the client will have significant performance pressure, and too many messages will have a cumulative effect.Copy the code

Since technical challenge one is not difficult to solve, the following section focuses on technical challenge two.

5. Technical design objectives

Considering the technical challenges and live broadcast business scenarios in the previous section, we have an obvious index definition for the requirements and objectives of the message system.

Technical design objectives are roughly defined as follows:

1) Real-time: the end and end messages should reach the second level;
2) Performance: Message service can support online delivery of more than one million users in the same direct broadcast room at the same time;
3) Peak processing: For excessive messages at the peak, discarding is a reasonable and appropriate processing method;
4) Based on reasonable end user experience, the number of messages per second in a single direct broadcast room is assumed to be no more than N.

** Now: ** The core of the problem is how to deliver no more than N messages to millions of users in the live broadcast room within S seconds (assuming N<=20, S<=2).

6. Take inspiration from the technical implementation of ordinary IM group chats

6.1 Analysis of Sending and receiving Ordinary IM group Chat Messages

IM group chat data flow and pressure points:

As shown in the figure above, firstly analyze the message sending and receiving process of ordinary group chat in detail:

1) For group group-1, assign a group public message box group-mbox-1;
2) User user-1 in group group-1 sends message MSG-1 from app-1 on the mobile terminal;
3) After receiving the message msG-1, the server checks whether user-1 has permission. If so, the server stores msG-1 to the group mailbox group-mbox-1 and generates the corresponding MSGID-1.
4) The server queries the user list groupuserlist-1 corresponding to group-1;
Groupuserlist-1, groupuserlist-1, groupuserlist-1, groupuserlist-1, groupuserlist-1 User – n;
6) For each user user-i, the device where user user-i resides needs to be queried: device-i-1, device-i-2, and device-i-m (because one account may log in to multiple devices).
7) For each device device-I-J, the long connection channel will establish an independent long connection connect-J to serve the device; However, since connect-j is connected to the long connection service by app-1 on the end, it is dynamic. Therefore, the query of the corresponding relationship between device-i-j and connect-j needs to rely on a routing service route to complete the query.
8) After connect-j is found, groupmsg-notify-1 can be delivered through connect-j;
9) If user user-i is using device-i-j’s mobile APP-1, user user-i can immediately receive msG-notify-1 notification from long connection connect-j.
10) After receiving groupmsG-notify-1, the message SDK in app-1 on the mobile end initiates the fetchMsg pull message request to the server according to the message ID of latestMsgID corresponding to the last message latestMsg recorded in the local history message of the end. Pull all messages from latestMsgID+1 to the latest in group-1;
11) After receiving the fetchMsg request, the server retrieves all latestMsgID+1 to the latest messages from group-mbox-1 and returns them to the server; If there are too many messages, you may need to end paging pull.
12) End app-1 pulls all messages from latestMsgID+1 to the latest in group-1 and can be displayed; After the user reads in the session, the read status or session read status of all new messages needs to be set.

6.2 The main pressure of ordinary IM group chat

If the whole process of sending notifications to the end of ordinary group chat messages is completely reused, if a message msG-1 sent by User-1 needs to support a real-time million-magnitude group message, there are about several million-magnitude challenges per second.

First of all:

Groupuserlist-1, which requires reading millions of user lists in seconds, is the first million-level challenge for storage and service.

The second:

For all the individual users in the split group, user-i needs to query millions of device-i-j in seconds. For storage and services, this is the second million challenge.

The third:

For all device-i-J, millions of connect-j are queried in seconds through the dynamic routing service Route, which is the third million challenge for storage and service.

Fourth:

When long connection connect-j is delivered, groupmsG-notify-1 needs to be delivered to the corresponding connect-j in seconds, which is a challenge of millions for long connection service.

Fifth:

For all end app-1 receiving message notification, it is necessary to support one million QPS end to pull message request fetchMsg from the server. For message mailbox service, this is also a challenge of one million magnitude. Considering that the latestMsgID of each end may be different, the possible optimization method is more complex and has a greater impact on performance.

Sixth:

If most users are chatting online, setting the read state can also put millions of QPS on the server.

** Obviously: ** completely reusing the message flow of a group chat puts enormous pressure on messaging services and long-connected services.

6.3 Optimization solution of ordinary IM group Chat

Pressure points after optimization of IM group chat data stream:

As shown in the figure above, let’s now analyze each of the above million-magnitude challenges and see if there is room for optimization:

1) For ① split user list and ② query user corresponding device, if the storage of the two are combined together, that is, optimize the storage of the user list in the live broadcast room, expand device information, can reduce a user->device millions of QPS query, can be optimized;
2) For the reliable message pull mode of ④ downlink notification and ⑤ end pull fetchMsg, considering that live messages can be partially discounted and discarded, only one-way message delivery can be performed instead of pull. It is also acceptable for most users whose connections remain online. Therefore, it can be optimized to keep only downstream notifications (containing the message body) and discard the end pull.
3) For ⑥ message set to read, simplifying and discarding can be considered in live broadcast scenarios.

The above optimization reduces the pressure requests of ②⑤⑥ by three million magnitude, but there are still three million magnitude steps to deal with: ① split user list, ③ dynamic routing query, and ④ long connection delivery.

For ① split user list:

Support millions of magnitude of user list query, the more general idea is to support batch query based on group groupID, for example, a can find 100 users, 10,000 QPS query can support to millions; Based on group groupID, the storage of user data can be distributed to multiple master-slave instances and fragments to control the granularity of fragmentation and avoid hot spots. It can be basically done, but the storage resources may be consumed more.

For ③ dynamic route query:

On the surface, the problems are similar to ①, but somewhat different. Because the user list of the group, is based on the group groupID to do the key, to establish a table or multiple broken table; The device-I-j query is completely decentralized, which also requires batch query capability. However, the fully decentralized device information query cannot be optimized only for specific keys, and dynamic routing service is required to support query performance of millions of QPS on the whole.

Deliver the following for ④ long connection service:

The long-link service does not depend on external storage services. To support the delivery capacity of millions, if a single long-link instance can support the delivery capacity of 10,000, 100 instances can support the delivery capacity of millions.

** Based on the above analysis: ** supports the message delivery of millions of magnitude, beginning to see the dawn. It seems that all you need to do is optimize user lists, store/query dynamic routes, and expand the capacity of long connections, but all of that requires a lot of storage and machine resources.

Considering the reality of the live streaming business, the reality is not optimistic:

1) On the one hand, when there is no hot live broadcast at ordinary times, the peak number of concurrent online users in a single live broadcast may not exceed 10,000, or even less than 1,000; In the early stage of the business, the total online users of live broadcast may not exceed 100,000. This means that resources as a whole are tens of times redundant to support million-dollar peaks;
2) On the other hand, if there is a very popular live broadcast, it may not only need to support the release of a message of magnitude 1 million, but also may need to support the release of a message of magnitude more than 5 million (such as the National Day parade, Spring Festival Gala, etc.). In this case, the possible peak value of online users should be estimated in advance for each large live broadcast. If the current designed capacity is exceeded, the user list (1), dynamic route query (3) and long connection service (4) need to be expanded and tested respectively. Or, if acceptable, do downgrading or denial of service.

In fact, it is difficult to estimate the peak value of online users. As a result, the actual resource utilization is low, the capacity expansion or reduction is frequent, and the O&M costs are high. Whether to choose this plan is also very confusing.

6.4 Common group Chat Multi-group solution

Splitting multiple groups has also been mentioned.

** For example: ** If a group supports a maximum of 10,000 users, opening 100 groups can support a million users; It seems possible to create another virtual group, to connect these 100 groups together.

However, if carefully analyzed, it will be found that the problems mentioned above: “① split user list, ③ dynamic route query, ④ long connection delivery”, high pressure still exists, or inevitable.

In addition, multiple groups introduce other problems:

1) Problem 1: Messages in multiple groups are not synchronized. If two users watch a live broadcast together and belong to different groups, they will see completely different messages.
2) Problem 2: Users in and out of live broadcast scene are dynamic, that is to say, group members are very unstable, and the peak fluctuation of online users is relatively large. If a new group is dynamically opened according to the growth of the online population, the first group may have many users, while the second group has fewer users at the beginning. Or, during the peak period, more groups are opened. As the popularity decreases and users leave, users become scattered. Some groups may have fewer users and less chat interaction, so it is necessary to reduce the size and merge groups. How to balance multiple groups of users to achieve good business results is also more difficult to do.

Based on the above analysis, we did not choose the multi-group scheme.

7. Message architecture practice based on multicast McAst scheme

After the architecture design of comparing ordinary IM group chat messages in the previous section, this section will introduce the proposal and evolution of multicast McAst scheme, a live message architecture that supports real-time high concurrent millions of concurrent online users.

7.1 Think outside the box

Do you want to use the IM group chat optimization solution described above, or can you take a different approach?

** Forget the group sending and receiving process for a moment: ** If there is one step that is absolutely necessary for message delivery, it is the long connection delivery step. Messages cannot reach the user if they are not delivered over a long connection.

Of course, some people say that polling pull can also replace long connection delivery to obtain messages, but obviously polling pull performance pressure and real-time performance is much worse than long connection delivery, so it is not the scope of discussion.

** If we can simplify it to: ** specify a similar groupID when sending messages to the long connection service, the long connection service can be directly split to the long connection connect-j related to all group users, and the million-magnitude query of user list splitting and dynamic route query can be omitted.

** In this case: ** message delivery pressure will be mainly borne by the long connection service, the server does not need to expand the capacity of multiple systems, the optimization of live message may be greatly simplified.

** This is equivalent to creating a group concept for connect in the long connection service. Based on the idea of connection group, we design a set of multicast McAst mechanism for long connection.

7.2 Basic Concepts of Long-link Multicast McAst

The basic concepts are summarized as follows:

1) Each long-link multicast McAst has a globally unique identifier McAstID;
2) Long-link multicast McAst supports management operations such as creation, deletion, modification and query.
3) Long-link multicast McAst is a collection of connect of several long-link online users;
4) A user user-i on the device device-i-j establishes a unique long connection connect-j-K for a specific application app-k (here there is no difference between login user and non-login user);
5) The maintenance of the relationship between long-link multicast McAst and connect-J-K within the group does not require additional independent storage and is maintained on each instance of long-link service.

7.3 Concepts of Long-Link Multicast McAst Routing

The route of multicast McAst-m, route-m, is a collection of long connection service instances LcsList, which records the long connection service instance LCS-j of all long connection connect-I added to McAst-m.

7.4 Maintaining records of long-Link Multicast McAst routes

Logical flow for joining multicast McAst:

1) The client invokes the message SDK to join McAst-m;
2) Message SDK sends an upstream request McAstJoin (McAst-m) through a long connection;
3) The business layer receives the McAstJoin request of connect-I on long connection instance LCS-I to verify the validity of McAst-m;
4) The request routing layer of the business layer establishes the multicast route McAstroute-m based on the multicast McAst-m, and adds the long connection instance LCS-I to the multicast route McAstroute-m.
5) The business layer requests the long connection service layer, requests the long connection instance LCS -i where McAstJoin resides, and adds the connection connect-i to McAstConnectList -m.

Leaving the multicast McAst is similar to joining the multicast McAst. The client invokes the message SDK to leave McAst-m and sends an upstream request McAstLeave (McAst-m). The long-connect server updates the route and McAstconnectlist-m information.

7.5 Multicast McAst Message Push

Multicast McAst data streams and pressure points:

The process of long connection message push based on multicast McAst is a 1:M * 1:N diffusion amplification process.

The specific process is described as follows:

1) A message is pushed to MSG-1 and the destination is multicast McAst-m;
2) According to the destination McAst-m, the back-end service module makes a consistent hash to select the McAst route distribution module instance McAstrouter-i and sends msG-1 to McAstrouter-i.
3) McAst distributes routing module instance McAstroute-i, searches the corresponding access instance routing record list McAstLcsList -m according to multicast route McAstRoute -m of McAst-m, and splits all long connection access instance LCS-1 of McAst-m.. Lcs-m, send MSG-1 to the long connection instance concurrently;
4) A long connection service instance lCS-j, after receiving the message msG-1 push, searches the multicast connection list McAstConnectList -m according to the multicast McAst-m, and checks all connections in McAst-m connect-m-1.. Connect-m-n, push MSG-1 to message client SDK-m-1 concurrently.. The SDK – m – N;
5) After receiving MSG-1, the message client SDK-M-O submits it to the upper business (such as live streaming SDK).

7.6 Performance evaluation of the multicast McAst mechanism

Now let’s analyze the performance pressure of the above multicast McAst mechanism:

1) The main pressure of multicast McAst route maintenance lies in McAstJoin and McAstLeave, while the peak request of Join can hardly exceed 20,000 QPS; Access pressure is two orders of magnitude lower than millions;
2) The message push process of multicast McAst, when the first-level route McAstRoute is split into long-connected instances, it is generally in the order of tens to hundreds, and the cost is very low;
3) The message push of multicast McAst in a single instance of long connection is sent concurrently by multiple connections in a single process. After optimization, it is measured online. Under the condition that the single instance maintains 25W long connection, the McAst which can reach 8Wqps is delivered stably, and the capacity of 5Wqps is evaluated conservative. Multiple long-connected instances are fully concurrent and can be easily expanded horizontally.

** To sum up: ** For the delivery of 100Wqps, 20 long connection instances can be fully loaded (20*5W=100W) with a certain margin. If 500Wqps is delivered, there are no more than 100 instances; If the delivery of 1000W is carried by a larger load of 8W single instances, 125 instances can support it.

It seems that, based on the above multicast McAst mechanism, we have established a set of efficient long connection delivery mechanism that supports millions of QPS, and the current capacity of long connection service can be supported, basically without expansion. However, whether it can fully meet the requirements of live broadcast business scenarios needs further discussion.

7.7 Message peak problem

For 1 message per second, spread to 100W users, or even 500W users, the above multicast McAst mechanism seems to be able to handle it.

** However, the actual situation of the messages in the live broadcast room is as follows: ** Popular live broadcast has a lot of users’ uplink chat messages per second. In addition to the chat messages, there are also many kinds of system messages sent regularly or irregularly in the live broadcast room, such as the number of people, approaches, likes and shares.

** If we assume that there are 100 types of messages per second at peak value: **100W*100= 100 million, it simply takes a single instance 5Wqps to support 2000 instances. Although it is much better than the old group chat system, the system still encounters a large amount of resource redundancy or requires a large amount of capacity expansion to cope with peak value. Is there a better solution?

Here we consider a common optimization idea, is through the batch aggregation pattern to improve system performance.

If the 100 messages are aggregated and packaged once per second for unified delivery, the QPS is still 100W. The QPS delivered by the long-connected system remains unchanged, but the magnitude of messages delivered per second can reach 100 million. This aggregation scheme is actually feasible.

In the aggregation mode, the cost we pay is the increase of message delay. The average delay of 1-second aggregation increases by 500ms. The loss of user experience is not great, but the magnitude of message delivered by the system can be increased by a hundred times. Considering the actual scenarios of live broadcast, second-level aggregation and delay are acceptable in most scenarios.

7.8 Message Bandwidth Problems

As analyzed in the previous section, the QPS problem of long connection single instance with aggregation delay delivery is solved, followed by the bandwidth pressure problem of long connection single instance delivery.

** For example, when a single long connection instance needs to deliver 10000 long connections, 100 messages are sent per second. The average message bandwidth is 2K*100*10000*8=15625 Mbit/s, which exceeds the bandwidth capacity of the 10 MBIT/s NIC on a single physical machine.

** On the other hand: ** From the point of view of global bandwidth, also up to 1.5Tbps, bandwidth resources will also put pressure on the machine room exit, such bandwidth cost is too high, need to reduce bandwidth use or have a better alternative.

Faced with the problem of high bandwidth consumption of delivered data, we adopted a data compression solution without changing service data.

Compression is cpu-intensive operation. Due to the real-time performance of live broadcast services, the compression ratio cannot be simply considered. After comprehensively balancing the compression ratio, compression delay and compression CPU consumption, the measured average compression ratio after tuning the compression library reaches 6.7: 1, the amount of data is compressed to about 15% of the original, so 15625Mbps*15%=2344Mbps=2.29Gbps. The bandwidth capacity of a single 10-gigabit network card can bear up to 42,700 long connections. Although it does not reach 50,000, it is basically acceptable.

From the global bandwidth point of view, the peak is also reduced to no more than 230Gbps, the benefits are significant.

7.9 Client Performance Problems

Further, in the live broadcast scenario, there is not only a high peak message magnitude, but also a continuous high message magnitude pressure during the live broadcast. This is not only stressful for the server, but also challenging for the client.

Sustained high message magnitude:

1) On the one hand, the client has obvious pressure in receiving and displaying;
2) On the other hand, too many and too fast news updates on the live broadcast interface are also harmful to user experience.

So: * * * * in the overall balance user experience and performance on the basis of the client and the message server adds the combination of message priority classification speed limit frequency control mechanism, single-user client does not need to bear the pressure of 100 per second, cutting issued news every second, and 5-80000 connections per second single instance allots long connection, CPU and bandwidth can be stable support.

7.10 Real-time Message Failure

We provide a real-time delivery mechanism based on message priority:

1) Aggregation delivery can be triggered immediately for high-priority messages without increasing aggregation delay;
2) For ordinary medium and low optimal messages, delayed aggregation is still delivered.

7.11 Online User Failure

The starting point of multicast McAst mechanism is to ensure the arrival of messages from online users, allow offline users to receive partial loss of messages, pay reasonable technical complexity and cost, and achieve a balance between service quality and performance under the scenario of millions of concurrent online users.

And for the message arrival of online users, another key problem is how to ensure the long connection of users online.

In order to improve the access stability and reachability of the long connection service, we have optimized the following aspects.

1) Access point:

The long connection service has deployed access points in north, East and South China regions of the three major domestic operators. For the live broadcast scenes of some foreign users, the independent access point entrance of the Computer room in Hong Kong is also added.

2) HTTPDNS:

DNS hijacking issue for some users and parse error problem, message SDK HTTPDNS access service and optimize the local cache, form a multi-level DNS security system, improve the reliability of the domain name resolution, reduce DNS hijacking and error rates (see “baidu APP mobile terminal network sharing of best practice in depth (a) : DNS optimization”).

3) Heartbeat optimization:

Long-link heartbeat is an important means to keep alive and detect alive. In view of the real-time characteristics of live broadcast scenes, in order to find long-link broken chain as soon as possible, after multicast McAstJoin, the long-link heartbeat is also adjusted to an intelligent heartbeat with shorter interval and dynamic control of the server.

This allows the messaging SDK to quickly and proactively re-establish the connection if a connection exception is detected in time.

4) Chain broken recovery:

If a multicast McAst member has been added to the multicast McAst, if the long link breaks, the long link server automatically or passively clears the multicast McAst member.

When the long connection reconstruction connection is recovered, the live broadcast service layer also needs to listen to the connection recovery signal and rejoin the multicast McAst to recover the message path of the multicast McAst.

7.12 Summary

To sum up, the multicast McAst mechanism:

1) Effectively solve the problem of real-time message delivery of millions of concurrent online users;
2) For short chain interruption and too many messages, part of the message is allowed to be discarded;
3) Meet the design goal of live scene message.

The characteristics of multicast McAst mechanism are:

1) The message service and routing layer have relatively light pressure, and the overall pressure is only borne by the long connection layer, which is easy to expand horizontally;
2) Performance problems of downlink QPS and bandwidth can be well solved based on delayed aggregation delivery and compression speed limiting;
3) The overall downlink QPS and bandwidth of the system are completely controllable. The maximum downlink QPS of a 100W online user is 100W, and that of a 500W online user is 500W. The delivery capacity of a single instance is stable at 50-80,000 QPS. Therefore, you can easily determine whether the overall system capacity needs to be expanded in special scenarios.
4) Although McAst mechanism is proposed for live broadcast scenarios, it has universal design and can be applied to other message push scenarios where a large number of users are grouped online in real time.

8. Further expansion of message architecture based on multicast McAst scheme

After the multicast McAst mechanism solves the problem of real-time message delivery for millions of online users, the scene of live message continues to expand, and new message demands are constantly put forward by innovative live message services.

Accordingly, the service mechanism of multicast McAst also needs to keep pace with The Times and continuously expand and optimize in depth and breadth. The following highlights the historical and gift news.

8.1 Support of live broadcast historical news

For users who just enter the live broadcast room, they need to see some recent chat records to enhance the interactive atmosphere of chat and help understand the progress of live broadcast. Users interested in historical chat history can also trace more message history. This creates a need for chat history.

In order to support the demand of such historical messages, the extension scheme is to open a multicast public message box McAst-mbox service for each multicast McAst application.

The logic goes like this:

1) For user messages and other messages that need persistence, all the messages are written to this message mailbox;
2) Users can specify multicast McAstID to obtain historical multicast McAst messages according to the time interval and the number of messages to be pulled.

The following is a supplementary description of the concept and application of message information.

What is the concept of message box service?

1) A message MSG in the message mailbox with a unique message identifier msgID;
2) MSG of a message, including sender information, receiver information, message type, message content and other fields, can be temporarily ignored here;
3) Expiration time can be set for each message, and the message cannot be accessed after expiration.
4) The read status of each message can be set;
5) a message box mbox with a unique mailbox identifier mboxID;
6) A message box mbox is a container that stores an ordered list of messages. Message list msgList sorted by msgID;
7) Message mailbox service, supporting single message or batch message writing to the specified mailbox Mbox;
8) Message mailbox service, support single message or batch message search based on msgID for the specified mailbox mbox;
9) Message mailbox service, support mbox search from msgID-begin to msGID-end.

In fact, the most commonly used is message pull based on the MSGID range. The message mailbox service here is the timeline timeline model. Interested students can refer to the timeline timeline model for further information (see section 4 timeline Model in Discussion on The Synchronization and Storage Scheme of Chat Messages in Modern IM System).

8.2 Direct broadcast gift message support

Gift message:

Gift message scenario analysis:

1) If the user gives a gift to the anchor, the anchor side needs to receive the gift notification as soon as possible and reliably so as to give feedback to the user in time;
2) Users who send gifts can display the gift effect locally in a timely manner, without strong appeal of message notification;
3) Other users in the live broadcast room need to receive the gift message to show the gift effect, improve the interactive atmosphere in the live broadcast room and inspire other users to give gifts;
4) The gift message involves user order and purchase behavior, which needs to be confirmed and sent by the server;
5) Gift messages have a higher priority than other chat messages and system messages.

Based on the above analysis, live broadcast news puts forward the following technical expansion plan:

1) Add an independent reliable message multicast McAst channel (multicast McAst-2 in Figure 4), which is dedicated to receiving and receiving high-quality reliable messages; Isolated from other common messages and system messages at the data flow level to reduce mutual interference;
2) For the common user side end message SDK, although the gift message multicast McAst channel is a new independent channel, the message sending and receiving logic is consistent with the common message multicast McAst channel;
3) For the host side, the end message SDK needs to support the combination of push and pull mode for the gift message multicast McAst channel to ensure the arrival of all the gift messages; Even if there is a short drop, you need to get all the gift messages;
4) For the host side, in extreme cases, if there is an anomaly in the long connection construction, the message SDK can polling through the short connection interface to pull the gift multicast McAst mailbox message for the bottom.

Based on the above independent reliable message multicast McAst channel scheme, the gift message reach rate has reached more than 99.9% without excluding some abnormal scenarios, such as the anchor offline broadcast, data accidental loss, etc.

8.3 Development of other aspects of live broadcast news

In the course of development of Baidu Live broadcast, the live broadcast news service also faces many other basic problems and other challenges brought by innovative business.

Now there are better solutions to these problems. Here are some for your reference:

1) How to support a variety of client scenarios, android, iOS, H5, small program, PC;
2) How to support the opening of the same live broadcast message on Baidu APP, Good-looking APP, Quanmin APP and Tieba matrix APP;
3) How to support non-logged-in users: IM generally supports logged-in users, and live broadcast scenarios also need to support non-logged-in users;
4) If there is a serious problem in the long connection service, whether there is a degraded channel for the end to obtain messages;
5) How to conduct the review of live broadcast news by the machine censors, and how to support the pre-review and post-release;
6) How to support messages across multiple live broadcast rooms;
7) How does the live news service support innovative businesses, such as answer live, live with goods, live with mics, etc.

Due to space limitation, the above questions will not be discussed in detail here. Interested students are welcome to discuss.

9. Review and outlook

In the past few years since the launch of Baidu Live Broadcast, the live broadcast message service has braved difficulties, escorted Baidu Live and provided solid technical support and guarantee for baidu Live.

In the future, in terms of supporting innovation business of live broadcast, finer granularity message classification service, stability and performance of basic services of live broadcast news, the live broadcast news service will continue to make efforts to consolidate the foundation and innovate continuously to support better and faster development of live broadcast business. (This article is simultaneously published at: www.52im.net/thread-3515…)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Chat technology of live broadcast system (IV) : Evolution practice of massive user real-time message system architecture of Baidu Live