Loss, duplication, and backlog of message queues

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

When we use MQ, we almost always ask the question: how do I ensure that 100% of my messages are not lost when USING MQ?

This question is a common one in practice, both to test a candidate’s knowledge of MQ middleware technology and to distinguish a candidate’s level of competence. Let’s take this question and explore the basics you should know.

Case background

Taking JINGdong system as an example, when users purchase goods, they usually choose to use Jingdou to deduct part of the amount. In this process, the transaction service and Jingdou service communicate with each other through MQ message queue. When placing an order, the transaction service sends a “deduct account X 100 Peking beans” message to the MQ message queue, and Peking Beans service consumes this command on the consumer side for the real deduction operation.

What problems do you encounter in this process?

Case analysis

The most direct purpose of introducing MQ message-oriented middleware is to do system uncoupling flow control, and to solve the problem of high availability and high performance of Internet system.

System decoupling: can use MQ message queue, isolation system environment change of upstream and downstream of the unstable factors, such as the beans service system needs no matter how to change, trading service need not do any change, even when the bean service failure, the main trading process can also be the bean service downgraded, realize decoupling trading service and Beijing beans, do the high availability of the system.
Flow control: In the scenario of sudden increase of traffic, such as SEC, MQ can also be used to “peak load and valley fill” the flow. The flow can be automatically adjusted according to the downstream processing capacity.

But the introduction of MQ, while enabling system uncoupling flow control, also brings other problems.

The introduction of MQ message middleware to achieve system decoupling will affect the consistency of data transmission between systems. In a distributed system, if there is data synchronization between two nodes, there will be data consistency problems. The problem to be solved here is message data consistency between message producers and message consumers (that is, how to ensure that messages are not lost).

The introduction of MQ messaging middleware for traffic control, however, leads to a backlog of messages due to insufficient processing capacity on the consumer side, which is also a problem you need to address.

As you can see, questions are often linked to each other, and the interviewer is looking to see how consistent you are in solving problems and how well you have a body of knowledge.

So how do you solve the problem of “how to ensure messages are not lost when using MQ message queues”? First of all, you should analyze some of the questions.

How do I know if a message is lost
Which links may lose messages
How do I ensure that messages are not lost

Data transmission in the network is not reliable. To solve the problem of how to avoid message loss, we should first know which links are likely to lose messages, and how we know whether the message is lost, and finally the solution (rather than directly say their own solution). “Architecture” represents the architect’s thought process, and “design” is the final solution. Both are indispensable.

Case to answer

Let’s first look at the link of message loss. The process from production to consumption of a message can be divided into three stages, namely message production stage, message storage stage and message consumption stage.

Message production phase: From the time a message is produced and then submitted to MQ, the MESSAGE is successfully sent as long as an ACK response from the MQ Broker is received normally, so there is no message loss at this stage as long as return values and exceptions are handled.
Message storage stage: this stage usually directly to MQ message middleware to guarantee, but you have to know the principle of it, such as the Broker will make a copy of guarantee a message synchronously at least two nodes and back an ack (here involves the data consistency principle, I have said before, in the 04 speak during the interview, you can be flexible extension).
Message consumption phase: The consumer pulls messages from the Broker. If the consumer waits until the business logic has been executed, the message will not be lost as long as it does not immediately send the message to the Broker.

However, in distributed systems, failures are inevitable. As a consumer producer, there is no guarantee that MQ will lose your messages and consumers will consume your messages. Therefore, in line with the Design principle of “Design for Failure”, You still need a mechanism to Check if messages are lost.

Next, how do we detect messages? The overall solution is as follows: On the message production end, assign a global unique ID to each sent message, or attach a continuously increasing version number, and then perform version verification on the consuming end.

How to implement it? You can use the interceptor mechanism. The message version number is injected into the message by the interceptor before the message is sent by the production side (the version number can be generated using either a continuously increasing ID or a distributed globally unique ID). Then, after the consumer receives the message, it detects the continuity or consumption status of the version number through the interceptor. The advantage of this implementation is that the message detection code will not invade the business code, and the lost message can be located through a separate task for further investigation.

Note here: if there are multiple message producers and message consumers at the same time, it is difficult to implement the method of incrementing the version number, because the uniqueness of the version number cannot be guaranteed. In this case, the message detection can only be carried out by using the globally unique ID scheme. The specific implementation principle is the same as the method of incrementing the version number.

Loss, duplication, and backlog of message queues

Case background

Case analysis

Case to answer

Related Posts

The first micro-services you can Understand — a learning experience from Thoughtworks

Object memory layout in Java and how to calculate object size

Overview of Spring Security (1