Hi, I’m Susan, and I’m here again.

preface

Recently mq has become more and more popular, many companies are using it, many people are using it, its importance is self-evident. But if I asked you to answer these questions:

  1. Why do we use MQ?
  2. What more problems will mq introduce?
  3. How to solve these problems?

Do you have the answer in mind? This article will answer these seemingly ordinary but meaningful questions one by one.

1. What are the pain points of the traditional model?

1.1 (1

In some complex business systems, a user request may call N system interfaces synchronously. Only after all interfaces return can the execution result be obtained.

This way of synchronizing interface callsThe total time is longer, which greatly affects the user experience. Especially when the network is unstable, the interface timeout problem is very easy to occur.

1.2 (2

Many complex business systems are typically broken down into multiple subsystems. In the case of an order from a user, the request goes through the order system first and then through the payment system, the inventory system, the points system and the logistics system.System betweenToo much coupling, if any of the invoked subsystems is abnormal, the entire request will be abnormal, which is very bad for the system stability.

1.3 (3

Sometimes in order to attract users, we will engage in some activities, such as seckill.If the number of users is small, it will not affect the stability of the system. However, if the number of users suddenly increases and all the requests are sent to the database at once, the database may not be able to handle the pressure, and the response may be slow or die.

For this kind of suddenRequest a peakCannot guarantee the stability of the system.

Why use MQ?

Three of the above traditional pattern problems can be easily solved using MQ.

2.1 the asynchronous

For pain point 1: long response times due to synchronous interface calls, switching from synchronous to asynchronous with MQ can significantly reduce system response times.

System A, as the message producer, can return the result directly after it has done its job. Instead of waiting for the message consumer to return, they end up doing all the business functions independently.

This avoids the problem that the total time is too long, which can affect the user experience.

Decoupling 2.2

For pain point 2: too much coupling between subsystems, with MQ, we only need to rely on MQ, avoiding the problem of strong dependencies between subsystems.

As a message producer, the order system can ensure that it has no exceptions and will not be affected by the exceptions of the payment system and other business subsystems, and each consumer business subsystem will not affect each other.

This transforms the previously complex business subsystem dependencies into simple dependencies that rely only on MQ, thus significantly reducing the degree of coupling between systems.

2.3 peak elimination

For pain point 3: Problems with system instability due to sudden spikes in requests. With MQ, peak reduction can be achieved.

After receiving the user request, the order system sends the request directly to MQ, and then the order consumer consumes the message from MQ and writes to the library. If there is aRequest a peakIn the case of “, due to the limited consumption capacity of consumers, they will consume messages at their own pace. Many requests are not processed and remain in the MQ queue, without affecting the stability of the system.

3. What problems will mq introduce?

The introduction of MQ has reduced the coupling between subsystems, and the asynchronous processing mechanism has reduced the response time of the system, while effectively dealing with the peak request problem and improving the stability of the system.

However, the introduction of MQ also brings some problems.

3.1 Repeated Messages

The problem of duplicate consumption is fairly common with MQ, no matter what type of MQ you use.

What are the scenarios where duplicate messages occur?

  1. The message producer generates duplicate messages
  2. Kafka and RocketMQ offset are called back
  3. Message consumer confirmation failed
  4. Message consumer confirmation timed out
  5. The service system initiates a retry

If repeated messages are not processed correctly, the business will be greatly affected, duplicate data will be generated, or the data will be abnormal, for example, the membership system has opened one more month of members.

3.2 Data Consistency problems

Many times, data consistency issues can arise if mq’s consumer business processing is abnormal. For example, a complete business process is to send 100 points after the order is successful. The order is written to the library, but the message consumer fails to send points, resulting in a data inconsistency situation where part of the business process is written to the library and another part is not.

If both the order and the delivery are in the same transaction and either succeed or fail at the same time, there will be no data consistency problem.

However, due to cross-system calls, for performance reasons, strong conformance is generally not used, but instead to achieve final conformance.

3.3 Message Loss

Message loss is also a common problem with MQ, no matter what kind of MQ you use.

What are the scenarios where message loss issues occur?

  1. When a message is generated by a message producer, an MQ failure occurs due to network reasons.
  2. The disk is abnormal when the MQ server is persisted
  3. Kafka and RocketMQ offset are called back, skipping a lot of messages.
  4. The message consumer has just read the message, has ack confirmation, but the service is restarted before the processing is complete.

There are many causes of message loss, including the producer, MQ server, and consumer, which I will not list here. The end result is that consumers can’t process messages properly, leading to data inconsistency.

3.4 Message sequence problems

Some business data is stateful, such as an order has: order, pay, Complete, return, and so on. If the order data is used as the body of the message, there are ordering issues. If a consumer receives two messages for the same order, the status of the first message is order and the status of the second message is pay, this is fine. But if the status of the first message is pay, and the status of the second message is order then you have a problem, you paid without order, right?Message order is a very tricky problem, such as:

  • kafkaThe samepartitionThe order is guaranteed, but differentpartitionOrder cannot be guaranteed.
  • rabbitmqIn the samequeueOrder can be guaranteed, but if multiple consumers are the samequeueThere will also be order problems.

If a consumer consumes a message using multiple threads, there is no guarantee of order.

If an exception occurs in the middle of multiple messages for the same order when consuming a message, the order will be disrupted.

Also, if the routing rules that producers send to MQ are different from those of consumers, there is no guarantee of order.

3.5 Message Accumulation

The full MQ mechanism works best if the message consumer can read the message at the pace of the message producer. But many times, because of batch processing or other reasons, message consumption is slower than production. This leads directly to message stacking problems that affect business functionality.

Here is the following single open membership as an example, if the message accumulation, will cause users to place an order, it will be a long time before they become members, this situation will certainly cause a lot of user complaints.

3.6 System Complexity Increases

System complexity is not the same as system coupling. For example, before there were only three systems: System A, system B and system C, but now with mq, you need to focus on MQ services in addition to the first three systems. The more points you need to focus on, the more complex the system.The mq mechanism needs: producer, MQ server, consumer.

There are certain learning costs, and additional MQ servers need to be deployed. Moreover, some MQS, such as RocketMQ, are very powerful and a little complicated to use. If not used well, there will be many problems. Some problems are not as easy to troubleshoot as interface calls, resulting in increased system complexity.

4 How can I solve these problems?

Mq is a trend that, in general, will do more good than harm to our system, so should we not use it because it will cause some problems?

So how do we solve these problems?

4.1 Message duplication

Whether it is due to duplicate messages produced by the producer or duplicate messages caused by the consumer, we can solve the problem in the consumer.

This requires consumers to do business processing, to do idempotent design, if there are friends who do not know how to design, you can refer to “How to ensure the idempotent interface under high concurrency?” , which is very detailed.

I recommend adding a consumption message table here to address this type of problem with MQ. In the consumption message table, messageId is used as the unique index. Before processing the business logic, check whether the message has been processed according to the messageId. If the message has been processed, the success is directly returned; if not, the business processing continues.

4.2 Data Consistency Problems

We all know that data consistency is divided into:

  • Strong consistency
  • Weak consistency
  • Final consistency

Whereas MQ uses ultimate consistency for performance purposes, data inconsistency is bound to occur. This problem is most likely caused by a failure of business logic processing after the consumer reads the message, in which case a retry mechanism can be added.

Retry is classified into synchronous retry and asynchronous retry.

Some small message volume business scenarios, you can use synchronous retry, when the message is consumed if processing fails, immediately retry 3-5 times, if still failed, write to the record table. However, if the number of messages is large, it is not recommended to use this method. If network exceptions occur, a large number of messages may be repeatedly retried, affecting the message reading speed and resulting in message accumulation.

In the case of a large number of messages, asynchronous retry is recommended. After the customer fails to process the message, the user writes the message to the retry table immediately. There is a job for periodical retry.

Another way is to send a message to the same topic if the consumption fails, and consume that message again at a later point in time, thus playing the effect of retry. This can be used in scenarios where message order is not important.

4.3 Message Loss

Whether or not you admit that sometimes messages do get lost, even if the probability is very small, it can have an impact on business. The producer, the MQ server, and the consumer all have potential problems with message loss.

To solve this problem, we can add a message sending table to which a piece of data is written when the producer finishes sending a message, and the status is marked as to be confirmed. If the consumer reads the message, the producer’s API is called to update the status of the message to confirmed. There is a job that checks the message sending table at regular intervals. If there are still messages to be confirmed after 5 minutes (the time can be determined according to the actual situation), the message is considered lost and the message is rewound.

The job will then resend the message whether the message is lost due to the producer, mq server, or consumer.

4.4 Message sequence Is incorrect

Message ordering is a very common problem for us. Let’s take a Kafka consumer order message as an example. Orders include: order, payment, completion, return and other states. These states are in order. If the order is wrong, it will lead to business exceptions.

Before addressing these issues, let’s make sure consumers really need to know the intermediate state. Is it ok to know only the final state?

In fact, most of the time, what I really need to know is the final state. In this case, I can optimize the process:

This approach solves most message order problems.

But if there really is a need to ensure the order of messages. Order numbers are routed to differentpartition, the message of the same order number is sent to the same place every timepartition.

4.5 Message Accumulation

If the consumer consumes the message faster than the producer produces the message, the message heap problem will occur. There are many reasons for this, and if you want to learn more about it, check out my article some Unusual Pits I’ve Stepped in in Two years with Kafka.

So what about message stacking?

This depends on whether the message needs to be ordered.

If you don’t need to guarantee order, you can read the message and use multiple threads to process the business logic.

This increases the speed of business logic processing and solves the problem of message stacking. However, the number of core threads and the maximum number of threads in the thread pool need to be configured properly, otherwise it may waste system resources.

If order is required, messages can be read, distributed to multiple queues according to certain rules, and then processed in a single thread from the queues.

Well, that’s it for today, and I’ll see you next time. I am here just to introduce jade, in fact, mq related content there are many, such as: timing send, delay send, private message queue, transaction issues and so on, interested friends can find me private chat.