The interview questions

How to ensure that messages are not re-consumed? In other words, how can message consumption be idempotent?

Interviewer psychoanalysis

In fact, this is a very common question, the two questions can be asked together. Since it is a consumption message, it must consider whether there will be repeated consumption? Can you avoid repeated consumption? Or repeated consumption also do not cause system abnormality ok? This is a fundamental question in the MQ world, and is essentially a question of how you can ensure idempotency with message queues, which is something to consider in your architecture.

Analysis of interview questions

To answer this question, first of all you don’t want to hear this thing, you know nothing about it, you first roughly say what may have the problem of repeated consumption.

First of all, RabbitMQ, RocketMQ, Kafka, for example, can have the problem of repeated message consumption, normal. This problem is usually not guaranteed by MQ itself, but by our development. Take a Kafka, for example, and talk about repeated consumption.

Kafka actually has a concept of offsets. Each message is written with an offset, representing the sequence number of the message. After consuming the data, the consumer submits the offset of the consumed message at regular intervals. Next time I restart something, you can let me continue spending from the offset I spent last time.

However, there are always accidents. For example, we often encountered in production before, that is, you sometimes restart the system, depending on how you restart, if you encounter something urgent, you directly kill the process, and then restart. This will cause the consumer to have some message processing, but have no time to submit the offset, embarrassing. After the restart, a few messages are consumed again.

Here’s an example.

Here’s a scenario. Data 1/2/3 enters Kafka in sequence. Kafka assigns each of the three data items an offset representing the serial number of the data item. We assume that the offset is 152/153/154. When consumers buy from Kafka, they buy in this order. Suppose that when the consumer consumes the data with offset=153 and is about to submit the offset to ZooKeeper, the consumer process restarts. Half of the consumed offset is not committed, and Kafka does not know that you have consumed offset=153. So after the reboot, the consumer will go to Kafka and say, hey, dude, you give me the data and then send me the data after the last place I bought it. Since the previous offset was not submitted successfully, the data 1/2 will be passed again. If the consumer does not lose weight at this time, it will lead to repeated consumption.

If what the consumer is doing is taking one piece of data and writing one piece of data into the database, it leads to saying, well, you might have inserted half of the data into the database twice, so the data is wrong.

In fact, repeated consumption is not terrible, what is terrible is that you have not considered repeated consumption, how to ensure idempotency.

Let me give you an example. Suppose you have a system that consumes one message and inserts one piece of data into the database. If you repeat one message twice, you insert two, and the data is wrong. But if you consume to the second time, their own judgment whether it has been consumed, if directly thrown away, so not retain a data, so as to ensure the correctness of the data.

If a piece of data is repeated twice, there is only one piece of data in the database, which ensures the idempotency of the system.

Idempotence, in plain English, is just one piece of data, or one request, given to you over and over again, and you have to make sure that the corresponding data doesn’t change, you can’t go wrong.

So the second question is, how can message queue consumption be idempotent?

In fact, we still have to combine business to think, I give a few ideas here:

  • For example, if you want to write data to the library, you first check the primary key, if the data is already available, you do not insert, update ok.
  • If you write Redis, that’s fine, it’s set every time anyway, natural idempotent.
  • For example, if you are not in the above two scenarios, it is a little more complicated. You need to ask the producer to add a globally unique ID, such as order ID, when sending each piece of data. Then when you consume the data here, you can check it in the Redis first according to this ID. If it hasn’t been consumed, you process it, and then that ID is written Redis. If you consume too much, don’t process it. Just make sure you don’t process the same message twice.
  • Such as database based unique key to ensure that duplicate data does not repeatedly insert multiple. Because of the unique key constraint, duplicate inserts only generate errors and do not cause dirty data in the database.

Of course, ensuring that consumption of MQ is idempotent requires a business-specific perspective.

Pay attention to my wechat official number, the first time to get the update of my blog remind, more surprise waiting for you ~

Scan the QR code below or search for wechat account shenshan_laoyuan

! [Deep Mountain ape](p1-jj.byteimg.com/tos-cn-i-t2…

This article by the multi-platform ArtiPub automatic release 76B33? w=300&h=300&f=png&s=85329)