This post was first posted by Yanglbme on GitHub’s Doocs (stars over 30K). Project address: github.com/doocs/advan…
The interview questions
How do I ensure that messages are sequential?
Interviewer psychoanalysis
Do you know anything about order? Second, is there any way you can make sure the messages are in order? This is a common problem in production systems.
Analysis of interview questions
For example, we used to make a mysql binlog synchronization system. The pressure was very heavy, and the daily data synchronization had to reach hundreds of millions, that is, the data from one mysql database was completely synchronized to another mysql database (mysql -> mysql). A common point is that big data team, for example, needs to synchronize a mysql library to do various complex operations on the data of the company’s business system.
If you add, delete, or change a data item in mysql, you add, delete, or change 3 binlogs. Then you send these 3 binlogs to MQ, and consume them in sequence. Otherwise it would have been: add, modify, delete; You change the order to execute into delete, modify, add, isn’t all wrong?
Originally this data is synchronized, should be the last data deleted; You end up with the wrong order, and the data stays, and the synchronization goes wrong.
Let’s take a look at two scenarios that get out of order:
- RabbitMQ: One queue, multiple consumers. For example, the producer sends three packets to RabbitMQ, data1/data2/data3, into a memory queue for RabbitMQ. Three consumers each consume one of these three pieces of data from MQ, resulting in consumer 2 completing the operation first, storing datA2 into the database, and then datA1 / datA3. It’s not obviously messed up.
- Kafka: Let’s say we build a topic with three partitions. The producer can specify a key when writing. For example, if we specify an order ID as the key, the data related to the order must be distributed to the same partition, and the data in the partition must be in order. When consumers retrieve data from a partition, there must also be an order. Up to this point, the order is ok, there’s no confusion. Then, we might have multiple threads in the consumer to process messages concurrently. Because if the consumer is single-threaded consuming processing, and the processing is time-consuming, such as processing a message takes tens of ms, then only dozens of messages can be processed per second, which is too low throughput. If multiple threads are running concurrently, the order may be out of order.
The solution
RabbitMQ
Splitting multiple queues, one consumer per queue, just a few more queues, that’s a real trouble spot; Or a queue corresponding to a consumer, which is queued internally by an in-memory queue and then distributed to different workers at the bottom for processing.
Kafka
- One topic, one partition, one consumer, internal single-thread consumption, single-thread throughput is too low to use this.
- Write N memory queues, all data with the same key to the same queue; Then, for N threads, each thread consumes a queue to ensure orderliness.
Welcome to follow my wechat public account “Doocs Open Source Community” and push original technical articles as soon as possible.