This article is originally shared by rongyun technical team. The original title is “Technology practice of message distribution and speed control scheme for ten thousand people chat”. The content has been revised to make the article better understood.

1, the introduction

Traditional IM group chats are usually 500 people like wechat, or 2,000 people like QQ (QQ has 3,000 people, but that’s charged separately, which means it’s not a no-door standard and not many people can use it).

Since a foreign called “the world’s most secure IM” launched wanzhongchat, Wanzhongchat has been rapidly accepted by domestic users. With the development of mobile Internet, instant messaging services are widely used in various industries, is no longer limited to traditional IM social applications), as the business fast development, the traditional group of upper limit of one hundred people, one thousand people have been unable to meet the demand of many business scenarios, so the large group of ten thousand people or one hundred thousand people also is accompanying, conform to the trend.

▲ “Paper Airplane” tens of thousands of people (developers tremble…)

IM group chat has always been one of the difficult hot technologies in IM applications. Usually, group chat is nothing more than 500 people, 1000 people and 2000 people, which is much more complicated than single chat. However, for ten thousand people chat (or even one hundred thousand people chat), compared with one hundred or one thousand people chat, it is almost a different technical dimension in technical implementation, which is much more difficult.

Based on the practical experience of rongyun technical team, this paper summarizes some thinking and practice of wanzhongchat message delivery scheme, hoping to give you inspiration.

2. Related articles

For more technical articles, read this one:

  1. Netease Cloud Communication Technology Sharing: Practical Summary of The Technical Solution of 10,000 People Chat in IM
  2. Reveal the IM Architecture design of enterprise wechat: Message model, ten thousand people, Read receipt, message Withdrawal, etc.
  3. Ali Dingding Technology Sharing: The King of Enterprise-level IM — The Excellence of DingDing in back-end Architecture

Other articles shared by Rongyun Technical team:

  1. “Sharing of Rongyun Technology: Practice of Network Link Preservation Technology of Rongyun Android IM Products”
  2. Rongyun Technology Sharing: Fully reveal the Reliable delivery mechanism of 100 million IM Messages
  3. Rongyun Technology Sharing: Real-time Audio and Video First Frame Display Time Optimization Practice based on WebRTC
  4. IM Message ID Technical Topic (3) : Decoding the Chat Message ID Generation Strategy of Rongyun IM Products

3. Technical challenges faced by Super-large groups

Compared with hundreds of people or thousands of people, a large group of ten thousand people or even one hundred thousand people greatly improves the number of people who can reach the group. For many business scenarios, the benefits are self-evident.

However, with such a large group of members, the traffic impact on IM system is very huge, and the technical difficulty can be imagined. Let’s start by analyzing the technical challenges of supergroups.

Take a model of ten thousand people:

  • 1) If someone in the group sends a message, the message needs to be distributed and delivered in a ratio of 1:9,999. If we follow the conventional message processing process, the message processing service will be under great pressure.
  • 2) In the case of a large number of messages, the processing speed of the server to push messages directly to the client will become a bottleneck of the system. Once the user’s message delivery queue is squeezed, normal message distribution will be affected and the service cache usage will surge.
  • 3) In microservice architecture, QPS and network traffic between services and storage (DB, cache) will also increase dramatically;
  • 4) Message cache in group unit has large memory and storage overhead (the storage of message body is magnified by ten thousand times).

Based on these technical challenges, in order to truly achieve the technical goals of the supergroup, it is necessary to do specific technical optimization to deal with them.

4. Message delivery model of general group chat

Let’s take a look at the message delivery model for a normal group chat.

Our general group chat message delivery model is shown in the figure below:

As shown in the figure above, when the user sends a message in the ordinary group, the delivery path is:

  • 1) The message is sent to the group service first;
  • 2) The group relationship cached by the group service is used to lock the target users to whom the message will be distributed;
  • 3) Distribute to message service according to certain policy;
  • 4) According to the user’s online status and message status, the message service will judge whether the message is direct Push, notification pull or Push, and finally deliver it to the target user.

As you would expect from a normal group chat message delivery, basically everyone has the same means of implementation. But for tens of thousands of people, that’s not enough.

Let’s take a look at our technical optimization methods for message delivery.

5, thousands of people chat message delivery optimization method 1: speed control

Speed control is one of our main methods for delivering messages to tens of thousands of people.

As shown above.

First of all, we will set up multiple group message distribution queues according to the number of server cores, which we set different sleep times and different number of consuming threads.

In layman’s terms, queues can be classified as fast, medium, slow, and so on.

Secondly: we map all groups to the corresponding queue according to the size of the number of group members.

The rule is:

  • 1) Small groups are mapped to fast queues;
  • 2) Large groups are mapped to corresponding slow queues.

** Then: ** Because of the small number of people, the service has little impact on the service, so the service uses the fast queue to quickly distribute the group message, while the large group message uses the relatively high delay of the slow queue to control the speed.

6, thousands of people chat message delivery optimization method 2: merger

In section 3 of this paper, the technical challenges faced by 10,000 people chat are mainly that N messages are cloned after the message is distributed and delivered, and the message flow is instantly amplified.

For example, when a group message is sent to the IM server, it needs to be delivered from the group service to the message service. If each group member delivers the message once and the message content is consistent, it will definitely cause corresponding resource waste and service pressure.

In this case, our solution is message merge delivery.

The principle is: in the calculation of service drop point, we use the consistency hash, the group member drop point is relatively fixed, so we can combine the group members with the same drop point into one request for delivery, which greatly improves the delivery efficiency and reduces the service pressure.

The following figure is the message combination delivery logic shared by yunxin team:

▲ The figure above is quoted from the summary of technical Solution Practice of Chatting with 10,000 people in IM

As shown in the figure above, the message combination delivery scheme of the yunxin team is: routing messages by Link grouping, and all group members on the same Link only need to route one message.

Large group processing scheme of 700, 100, million

In the actual group chat business, another business scenario is the super-large group, which has hundreds of thousands or even millions of people.

If this group follows the above delivery scheme, it is bound to still cause great pressure on message nodes.

For example, if we have a group of 100,000 people and five message nodes, the upper limit of message service processing is 4000 messages per second, each message node will receive about 20,000 group messages, which is far beyond the processing capacity of message nodes.

Therefore, in order to avoid the above problems, we will identify groups with more than 3000 members online as 10,000 people or super groups. This level of group can be adjusted according to the number of servers and server configuration, and special queues will be used to process the delivery of group messages for such super groups.

In this particular queue, the number of messages sent by the back-end message service is half of the maximum number of messages that the message service can process (leaving the corresponding capacity to process other messages). If the QPS limit of a single message service is 4000, the group service can send up to 2000 messages to a single message service in a second.

8. Write at the end

In the future, we will also carry out reference delivery for group messages. For large messages sent in a large group, we will distribute and cache only the index of the message to group members, such as MessageID. When the group members actually pull the group message, they assemble the message and distribute it to the client. This saves distributed traffic and storage space.

With the development of the Internet, the model and pressure of group business are constantly expanding, and there may be more challenges in the future. Of course, we will continue to iterate on better ways to deal with them. (This article is simultaneously published at: www.52im.net/thread-3687…)