There are many core points, but to make them more relevant, I’ll start with common interview questions:

  • How do I ensure that messages are not lost?
  • How do I handle duplicate messages?
  • How do I guarantee the orderliness of messages?
  • How do I handle message accumulation?

Of course, before analyzing these problems, we need to briefly introduce what message queue is and some basic terms and concepts commonly used in message queue.

Next, enter the body.

What is a message queue

Let’s take a look at What Wikipedia says, along with learning English:

In computer science, message queues and mailboxes are software-engineering components typically used for inter-process communication (IPC), Or for inter-thread communication within the same process. They use a queue for messaging — the passing of control or of content. Group communication systems provide similar kinds of functionality.

In computer science, message queues and mailboxes are software engineering components that are typically used to communicate between processes or threads within the same process. They pass messages through queues – passing control information or content, similar to what group communication systems provide.

To summarize the above definition, a message queue is a component that uses a queue to communicate.

The above definition is correct, but for now what we call message queues is often referred to as message-oriented middleware, which exists for more than just communication.

Why message queues

In essence, it is because of the rapid development of the Internet and the continuous expansion of business that the technical architecture needs to evolve constantly.

From the previous monolithic architecture to today’s microservice architecture, hundreds of services call and depend on each other. From the early days of the Internet, having 100 online users on a single server was a big deal, it now has 1 billion daily users. We need a “something” to decouple services, control the proper use of resources on time, buffer traffic peaks, and so on.

The message queue was born. It is commonly used for: asynchronous processing, service decoupling, flow control.

Asynchronous processing

With the development of the company, you may find that the request link of your project becomes longer and longer. For example, in the beginning of the e-commerce project, it can be rough to deduct inventory and place orders. Slowly add integral service, SMS service and so on. After this synchronous call, the client may be impatient, and this is a good time for the message queue to appear.

The call link is long, the response is slow, and relative to the inventory and order, points and SMS need not be so “timely”. So just throw a message to the message queue at the end of the process and return the response. And the credit service and SMS service can consume this message in parallel.

It can be seen that message queuing can reduce the waiting time of requests, and also enable services to be processed asynchronously and concurrently, thus improving the overall performance of the system.

The service of decoupling

As mentioned above, we added integral service and SMS service. At this time, we may need another marketing service. Later, the leader said that he wanted to do big data, and then came another data analysis service.

It can be found that the downstream system of the order is constantly expanding, in order to cater to these downstream systems the order service needs to be constantly modified, any change in the interface of the downstream system may affect the order service, the order service group can be crazy, really · “core” project team.

Therefore, message queues are generally chosen to solve the problem of coupling between systems. The order service stuffs the order-related messages into the message queue, and the downstream system subscribs to the topic whoever wants it. So the order service is liberated!

Flow control

Everyone has heard of “peak load filling”. Back-end services are relatively “weak” because they are heavy and take longer to process. Like some such as second kill activity explosive traffic hit may not be able to withstand. Therefore, you need to introduce a middleware to buffer, and message queues are perfect for that.

Requests from the gateway are first placed in the message queue, and the back-end service does its best to consume requests from the message queue. A timeout request can return an error.

Of course, some services, especially some background tasks, do not need to respond in time, and the business process is complex and long. Then the incoming request is put into the message queue, and the back-end service processes it at its own pace. That’s nice, too.

The above two situations correspond to the producer producing too fast and the consumer consuming too slow respectively, and the message queue can play a good buffer effect in them.

Pay attention to

The introduction of message queues certainly has the above benefits, but the introduction of more than one middleware system will reduce the stability of a layer, the operation and maintenance of a layer of difficulty. So the trade-off is that the system is evolutionary.

Basic concepts of message queues

Message queues have two models: a queue model and a publish/subscribe model.

Queuing models

Producers send messages to a queue. A queue can store messages from multiple producers, and a queue can have multiple consumers. However, consumers are in a competitive relationship, that is, each message can only be consumed by one consumer.

Publish/subscribe model

To solve the problem that a single message can be consumed by multiple consumers, the publish/subscribe model was developed. The model is to post a message to a Topic, or Topic, and all subscribers to that Topic can consume the message.

The way to think about it is, the publish/subscribe model is that we’re all in a group chat, and I send a message, and everyone who’s in the group gets that message. So the queue model is one-to-one chat, the message I send to you can only pop up in your chat window, not in someone else’s chat window.

At this point, some people said, “Well, if I chat one-on-one and send the same message to everyone, then one message will be consumed by multiple people.”

Yes, by fully storing the same message in multiple queues, that is, the redundancy of data enables one message to be consumed by multiple consumers. RabbitMQ uses the queue model to send messages to multiple queues using the Exchange module. The problem is that a message needs to be consumed by multiple consumers.

You can also see that if there’s only one person in the group chat besides me, then the publish/subscribe model and the queue model are essentially the same.

The subtotal

In the queue model, each message can only be consumed by one consumer, while the publish/subscribe model is designed to allow a message to be consumed by multiple consumers. Of course, the queue model can also solve the problem of a message being consumed by multiple consumers by fully storing messages in multiple queues, but there will be redundancy of data.

The publish/subscribe model is compatible with the queue model, which is basically the same as the queue model if there is only one consumer.

RabbitMQ uses the queue model, RocketMQ and Kafka use the publish/subscribe model.

The rest is based on the publish/subscribe model.

The term is commonly used

Generally, we call the sending Producer as a Producer, the receiving Consumer as a Consumer, and the message queue server as a Broker.

Messages are sent from the Producer to the Broker, which stores the message locally. The Consumer then pulls the message from the Broker, or the Broker pushes the message to the Consumer, which consumes it.

To improve concurrency, publish/subscribe models often introduce the concept of queues or partitions. That is, messages are sent to a queue or partition under a topic. RocketMQ is called a queue. Kafka is called a partition.

For example, if there are five queues under a topic, the concurrency of the topic increases to 5, and five consumers can consume messages for the topic in parallel. Messages on the same topic can be allocated to different queues using strategies such as polling or key hash residuals.

Corresponding consumers generally have the concept of Consumer Group, that is, consumers belong to a certain Consumer Group. A message is sent to multiple consumer groups that subscribe to the topic.

Suppose we now have two consumer groups, Group 1 and Group 2, which have subscribed to topic-A. A message is sent to Topic-A, and both consumer groups can receive the message.

This message is then actually written to a queue in Topic, and a consumer in the consumer group consumes a queue of messages.

In physical terms, there is only one copy of a message in the Broker, and each consumer group has its own offset, or consumption point, to identify the consumption point. Messages prior to the consumption point indicate that the consumption has already occurred. Of course, this offset is at the queue level. Each consumer group maintains the offset for each queue under the subscribed Topic.

It should be clear if you look at a picture.

Now that you are basically familiar with the common terms and concepts of message queues, let’s take a look at the common core aspects of message queues.

How do I ensure that messages are not lost

With our common message queues in the market, as long as we configure them properly, our messages will not be lost.

So let’s take a look at this picture,

You can see that there are three phases, namely production messages, store messages, and consumption messages. Let’s look at each of these phases to see how to ensure that messages are not lost.

Production of the message

The producer sends a message to the Broker and needs to process the Broker’s response. Whether the message is sent synchronously or asynchronously, both synchronous and asynchronous callbacks need to do a try-catch and handle the response properly. If the Broker returns an error message, such as a write failure, it needs to retry sending. When multiple sending failures need to alarm, log and so on.

This ensures that messages are not lost during the production message phase.

Store messages

The message storage phase needs to give the response to the producer after the message is flushed. If the message is written to the cache and the response is returned, the machine suddenly loses power and the producer thinks it has been sent successfully.

If brokers are clustered, there is a multi-copy mechanism, where messages need to be written not only to the current Broker, but also to the secondary machine. That is configured to write to at least two machines before giving the producer a response. This basically guarantees reliable storage. A hung still have a still in (if be afraid of two all hung.. More.)

What if all the machines go down in an earthquake room? emmmmmm… Big companies basically live in different places.

What if there’s an earthquake in all these places? emmmmmm… It’s time to take care of people.

News consumption

This is often the case when a student makes a mistake. Some students make a mistake when the consumer gets the message and stores it directly into the memory queue and then returns it directly to the Broker.

You need to think about what happens when the consumer goes down after you get the message and put it in memory. So we should send the Broker a successful purchase after the consumer has actually executed the business logic.

So as long as we respond to the Broker after the message business logic has been processed, the message will not be lost during the consumption phase.

The subtotal

As can be seen, to ensure the reliability of the message needs the cooperation of three parties.

The producer needs to handle the Broker’s response, using retries and alarms in case of an error.

The Broker needs to control the timing of the response. In the case of a single machine, the message will be flooded and in the case of a cluster with multiple replicas, the message will be sent to two or more replicas.

The consumer needs to return the response to the Broker after the actual business logic has been executed.

However, it is important to note that as message reliability increases, performance degrades. Waiting for messages to flush and returning after multiple copy synchronization can affect performance. Therefore, it depends on the business, for example, log transmission may lose one or two, so it does not matter, so there is no need to wait for messages to flush before responding.

If processing duplicate messages

Let’s first see if we can avoid message duplication.

If we send messages, regardless of the Broker’s response, then we do not send messages to the Broker repeatedly.

The basic requirement is that the message should at least go to the Broker, and then wait for the Broker to respond, and then maybe the Broker has written, and the producer hasn’t received the response because of the network, and then the producer sends it again, At this point the message is repeated.

When we look at Consumer consumption, suppose our Consumer gets the message consumption, the business logic has finished, the transaction has been committed, and then the Consumer needs to update the Consumer offset, and then the Consumer hangs, another Consumer takes over, and the Consumer offset hasn’t been updated yet, so we get the message again, The business is executed again. The message was repeated.

As you can see, message duplication is inevitable in normal business, so we have to approach the problem of duplicate messages from another Angle.

The key point is idempotent. Since we cannot prevent the occurrence of duplicate messages, we can only deal with the impact of duplicate messages in the business.

Idempotent processing of duplicate messages

Idempotence is a mathematical concept, we understand that the same parameters call the same interface many times and call the same result is the same.

SQLupdate t1 set money = 150 where id = 1 and money = 100; Execute money 150 times, which is called idempotent.

The business processing logic needs to be adapted so that repeated messages do not affect the final result.

You can make a precondition judgment like my SQL above, that is, money = 100, and modify it directly. More commonly, you can make a version control, that is, to compare the version number in the message with the version number in the database.

Or through database constraints such as unique keys, such as INSERT into update on duplicate key… .

Or record the key key, such as order processing, record the order ID, if there is a duplicate message, first determine whether the ID has been processed, if not processed and then proceed to the next step. You can use globally unique ids and so on.

Basically so a few routines, really applied to the actual or have to see the specific business details.

How do I guarantee the orderliness of messages

Order: global order and partial order.

The global order

To ensure that messages are globally ordered, only one producer can first send messages to a Topic, and only one queue (partition) can exist within a Topic. The consumer must also consume the queue in a single thread. Such messages are globally ordered!

In general, we don’t need global order, even if we are synchronizing MySQL Binlog, we only need to keep single-table messages in order.

Part of the order

So most of the ordering requirements are partially ordered, partially ordered and we can divide the Topic into the number of queues that we need, send messages to a fixed queue using a specific policy, and then each queue has a single threaded consumer. This satisfies part of the order requirement and improves message processing efficiency through the concurrency of the number of queues.

In the diagram, I have drawn multiple producers. One producer is fine, as long as the same type of message is sent to the specified queue.

If you deal with message accumulation

The accumulation of news is often caused by a mismatch between the pace of production by producers and the pace of consumption by consumers. It could be that message consumption failed and retried, or it could be that consumers were weak and messages accumulated over time.

So we need to first locate the cause of the slow consumption, if it is a bug with bugs, if because itself consumption ability is weak, we can optimize the consumer logic, such as before is a processing of a message consumption, this time we batch processing, such as database insert, one at a plug and batch plug efficiency are not the same.

If we have optimized the logic, but it is still slow, then we have to consider the horizontal expansion, increase the number of Topic queues and consumers, pay attention to the number of queues must be increased, or the newly added consumers will have nothing to consume. Within a Topic, a queue is assigned to only one consumer.

Of course, whether you’re a single threaded consumer or a multi-threaded consumer depends on the situation. If you write a message to an in-memory queue and then return a response to the Broker, then multiple threads consume messages to the in-memory queue. If the consumer is down, unconsumed messages in the in-memory queue will be lost.

The last

The above questions are often asked when using message queues and are the core of the interview questions about message queues. I won’t go into the details of specific message queues today, but the pattern is the same, and it’s important to understand the big picture. Kafka source code analysis article, interested partners please wait patiently.