How weBIM ensures reliable Delivery of Messages

In the previous chapter, WE shared the real-time problem of WEBIM messages (read “How WeBIM Uses Polling to Ensure Absolute real-time messages”).

Message reliability, that is, message not lost and not repeated, is also a difficulty in IM systems. At the beginning, QQ (then called OICQ) beat ICQ in terms of technology for the following two reasons: 1) QQ’s message delivery is reliable (messages are not lost, not repeated) 2) QQ’s spam is less (antispam does a good job, this is also a difficult point, but not the focus of this article) Today, this article will use very popular language, to talk about the problem of message reliability in weBIM system.

The CLIENT and server of IM send packets (network packets) to transfer messages. There are three types of packets

Request message (R for short)

Reply message (Acknowledge, later abbreviated as A)

The three types of notification packets (N for short) are described as follows:



R: Indicates the packet that the client proactively sends to the server

A: The server passively responds to the packets sent by the client. Each A corresponds to an R

N: Indicates the packet that the server proactively sends to the client

Ii. General message delivery process

User A sends A hello message to user B as follows:



1) Client-A sends A message request packet to the IM-server, MSG :R

2) After successful processing, the IM-server replies client-A with A message response package, namely MSG :A

3) If client-B is online, the IM-server sends a message notification packet to client-B, namely MSG :N (of course, if client-B is not online, the message will be stored offline).

3. The problems in the above message delivery process can be easily seen from the flow diagram. After the sender client-A receives MSG :A, it only means that the IM-server has successfully received the message, but it does not mean that client-B has received the message. MSG :N packet loss may occur in several scenarios, and the sender client-A is completely unaware of it. For example, 1) the server crashes and the MSG :N packet is not sent 2) network jitter and the MSG :N packet is discarded by the network device 3) Client-B crashes and the MSG :N packet is not received. MSG :N (MSG :N) :N (MSG :N) :N (MSG :N)

** Application layer confirmation + IM message reliably delivers six packets

** UPD is an unreliable transport layer protocol. TCP is a reliable transport layer protocol. How can TCP be reliable? The answer is: timeout, retransmission, confirm.

To achieve reliable message delivery at the application layer, the acknowledgement mechanism of the application layer must be added. That is, to ensure that the sender client-A receives the message, the receiver client-B must acknowledge A message. The acknowledgement process of the application layer is similar to the message sending process:



4) Client-B sends an ACK request to the IM-server, that is, ACK :R

5) After successful processing, the IM-server replies to client-B with an ACK response package, that is, ACK :A

6) The IM-server sends an ACK notification packet to client-A, that is, ACK :N

After receiving the ACK :N packet, client A can confirm that client B has received the ACK :N packet.

Will find that A message is sent, each containing (on) (the) two halves, namely MSG R/A/N three packets, ack of R/A/N three packets, A reliable delivery, and application layer im messages involved six message, this is what im message delivery in the system the most core technology (if an im system does not contain the six message, Don’t talk about the reliability of information.

Five, what are the problems of reliable message delivery? Six packets are expected to complete the reliable message delivery, but in practice: 2) MSG :N, ACK :R, ACK :A, ACK :N packets may be lost (the cause as described in Chapter 2 May be server crash, network jitter, or client crash). In this case, client A cannot receive the expected ACK :N packet. That is, client A cannot confirm whether Client B has received hello. What can I do?

Timeout and retransmission of messages

Client -A sends MSG :R, receives MSG :A, and attempts to resend MSG :R if it does not receive an ACK :N within an expected amount of time. It is possible that client A sends many messages at the same time. Therefore, client A needs to maintain A local ACK queue and cooperate with the timer timeout mechanism to record the messages that do not receive ACK :N to periodically resend the messages.



Once an ACK :N is received, it indicates that client-B has received a “hello” message and the corresponding message is removed from the “Waiting for ACK Queue”.

As mentioned in Chapter 5, MSG :N and ACK :N may be lost: 1) MSG: the N packet is lost, indicating that client-B has not received the “hello” packet before, and the timeout and retransmission mechanism is very effective. 2) ACK: the N packet is lost, indicating that client-B has received the “hello” packet before (but client-A does not know it). What if the timeout and retransmission mechanism causes client-B to receive duplicate messages? Revelation: Usually use qq, maybe you guys have similar experiences, pop up a dialog “sending failed because the network reason, news, at this point, it is possible that the other party didn’t receive the message (the sender network bad, MSG: N lost), may also have received the message, the receiver network is bad, after repeated retransmission, ack: N still lost), in this presentation, Everyone might as well confirm with opposite end, see which kind of case.

The sender client-A generates A msGID of the message to be de-duplicated, which is stored in the “wait for ACK queue”. The same message uses the same MSGID to be re-transmitted for client-B to de-duplicate, without affecting user experience.

1) The above design concept, retransmission by client, can ensure the server stateless (basic principle of architecture design) 2) If client-B is not online, im-server saves the offline message, to forge ACK :N sent to client-A 3) pull offline message, in order to ensure the reliability of the message, An ACK mechanism is also required. However, the actual situation is much simpler because there is no N packet for pulling offline messages. That is, send offline:R packet for pulling offline messages first, and send offlineack:R to delete offline messages after receiving offline:A

1) THE IM system ensures the reliable delivery of messages through timeout, retransmission, confirmation, and retransmission mechanisms. 2) Remember, A “hello” message contains 6 messages MSG :R/A/N in the first half and ACK :R/A/N in the second half

Individual messages are a 1-on-1 ACK, group messages are not so simple, group messages have a diffusion coefficient, if you are interested, next time we will discuss im group messages reliable delivery. 【 the 】

Reply to [Architecture], read “architecture Design and Implementation of 58.com Recommendation System”

Reply [Twitter], read “Analysis of Twitter System Architecture”

Reply to [YouTube], read “YouTube System Architecture”

Mysql catch-22

Mysql > select * from ’58 mysql’

Reply to [Database], read “58.com Database Architecture Design Ideas (II)”

Reply [Miaoskill], read “Miaoskill System Architecture Optimization Ideas” (fire)

In reply to [Polling], read “How WeBIM Uses Polling to Ensure Absolute Real-time Messaging” (New)

If there is a harvest, thank you forward.