This is the beginning of the Inside XXX technology series, the idea of which is to get to the bottom of each technology stack and then use it as a starting point to work through the rest of the core knowledge.

Therefore, the whole series focuses on the training of thinking ability, not only explaining What clearly, but also paying more attention to Why and How, so as to help people build a solid knowledge system.

Back to the main body, this is the first article in Understand MQ. This article focuses on the general basics of MQ so you can understand: if you were to design an MQ, how would you do it? What issues need to be considered? What are the technical challenges?

With this foundation in mind, I’m sure you’ll be able to quickly grasp the main thread and distinguish the characteristics of Kafka and RocketMQ, two specific messaging-oriented middleware, in the next few articles.

For MQ, whether it’s RocketMQ, Kafka, or any other message queue, the essence is: Once, once, once, consume. With this essence as the root, let’s talk about MQ from the beginning.

01 Start with the nature of MQ

MQ breaks apart and crumbs up to see, it is “a deposit a consumption”, and then straightforward point is a “repeater”. The producer posts the message to a container called a “queue,” takes it out of the container, forwards it to the consumer, and that’s it.

The diagram above is the original message queue model with two key words: message and queue.

1. Messages: The data to be transferred, which can be the simplest text string or a complex custom format (as long as it can be parsed in a predefined format).

Queue: You should be familiar with it, a fifO data structure. It is the container for storing messages, messages from the end of the team into the team, from the team head out of the team, the process of sending messages in the team, the process of receiving messages in the team.

02 Evolution of the original model

If you look at the most commonly used message queue products today (RocketMQ, Kafka, etc.), you’ll see that they all extend the original message model and introduce new terms such as topic, partition, queue, etc.

To fully understand the plethora of new concepts, let’s start with the evolution of the messaging model (for example, architecture is never designed, it evolves).

2.1 Queue Model

The original message Queue is the original model described in the previous section, which is strictly a Queue. The messages are read in the order in which they are written. However, the queue does not have a “read” operation, which is to “delete” the message from the queue header.

This is the queue model: it allows multiple producers to send messages to the same queue. However, if there are multiple consumers, it is actually a competitive relationship, that is, a message can only be received by one of the consumers, and is deleted upon reading.

2.2 Publish-subscribe model

If one piece of message data needs to be distributed to multiple consumers, and each consumer wants to receive the full message. Clearly, the queue model does not meet this requirement.

One possible solution is to create a separate queue for each consumer and let producers send multiple copies. This is stupid, and the same data can be duplicated multiple times, which is a waste of space.

To solve this problem, another messaging model has evolved: the publish-subscribe model.

In the publish-subscribe model, the container that holds messages becomes a “topic,” and subscribers need to “subscribe” to a topic before receiving messages. Eventually, each subscriber receives a full message for the same topic.

Compare this to the “queue pattern” : producers are publishers, queues are topics, and consumers are subscribers. The only difference is whether a single piece of message data can be consumed multiple times.

2.3 summary

Finally, make a summary, the above two models are simply: unicast and broadcast differences. Moreover, when there is only one subscriber in the publish-subscribe model, it is the same as the queue model and is therefore functionally fully compatible with the queue model.

This explains why RocketMQ and Kafka, the modern mainstream, are based directly on the publish-subscribe model. Also, why is there an Exchange module in RabbitMQ? In order to solve the problem of message delivery, we can realize the publish-subscribe model.

The concepts of “consumer group”, “cluster consumption” and “broadcast consumption” are related to the above two models, as well as the most common situations at the application level: inter-group broadcast and intra-group unicast, which also belong to this category.

Therefore, first master some common theory, for everyone to learn the specific realization principle of each message middleware, in fact, can better grasp the essence, distinguish the concept.

03 Application scenarios of MQ based on the model

There are so many application scenarios for MQ that you can recite them backwards: system decoupling, asynchronous communication, and peak traffic. In addition, there are delayed notification, ultimate consistency assurance, sequential messaging, streaming processing, and so on.

Does the message model come first, or does the application scenario come first? The answer must be: the application scenario (that is, the problem) comes first, and then the message model, because the message model is just an abstraction of the solution.

The fact that MQ has evolved over 30 years from its original queue model to the myriad messaging middleware (platform-level solutions) that it is today, I think, is largely due to the wide range of messaging models that are adaptable.

Let’s try to re-understand the message queue model. It actually solves the problem of communication between producers and consumers. How does it compare to RPC?

By comparison, two differences can be clearly seen:

1. With the introduction of MQ, there are now two RPCS instead of one, and the producer is only coupled to the queue and does not need to know about the existence of the consumer.

2. The addition of an intermediate node “queue” to dump messages is equivalent to turning synchronization into asynchronous.

If you think back to all of the application scenarios for MQ, it’s easy to see why MQ works. Because these application scenarios all take advantage of the above two features.

Take a practical example, for example, the most common “order payment” scenario in e-commerce business: after successful order payment, the order status needs to be updated, user points need to be updated, merchants need to be notified of new orders, and user portraits need to be updated in the recommendation system, etc.

With the introduction of MQ, order payment now only needs to focus on its most important process: updating order status. Anything else that is not important is notified by MQ. This is the core problem THAT MQ addresses: system decoupling.

Before the transformation, the order system relied on three external systems, but after the transformation, it only relied on MQ, and the subsequent expansion of the business (for example, the marketing system was intended to pay users for reward coupons) did not involve the change of the order system, thus ensuring the stability of the core process and reducing maintenance costs.

Another benefit of this transformation is that with the introduction of MQ, the steps of updating user credits, notifying merchants, and updating user profiles are all executed asynchronously, reducing the overall elapsed time of order payment and increasing the throughput of the order system. This is another typical use of MQ: asynchronous communication.

In addition, since queues can dump messages, MQ can be used as a “funnel” for traffic limiting protection, known as peak clipping, for scenarios that exceed the system’s capacity. We can also take advantage of the orderliness of the queue itself to satisfy scenarios in which messages must be delivered in order; Use queue + scheduled task to realize message delay consumption……

Other MQ application scenarios are similar in that you can go back to the characteristics of the message model and find out why it is appropriate, which is not covered here. In short, it is suggested that we return to the theoretical level of thinking and abstraction from the complex and changeable practice scenes, so that we can eat more thoroughly.

04 How to design an MQ Server?

With these theories and application scenarios in mind, let’s take a look: How do you design AN MQ?

4.1 Prototype of MQ

Let’s start with a simple version of MQ. What if we just implemented a crude VERSION of MQ without considering the requirements of the production environment?

As stated at the beginning of this article, there is nothing more to any MQ: release, storage, and consumption, which are the core functional requirements of MQ. In addition, MQ’s communication model from a technical perspective can be understood as two RPC + message dumps.

With this understanding in mind, I believe that with some programming background, it would take less than an hour to write a prototype of MQ:

1. Directly use mature RPC framework (Dubbo or Thrift) to implement two interfaces: message sending and message reading.

2. The messages can be stored in local memory, and the data structure can be used with the JDK’s ArrayBlockingQueue.

4.2 Write an MQ for production

Of course, our goal is not just a prototype OF MQ, but a message-oriented middleware that can be used in a production environment, which is of an order of magnitude more difficult. How do we start?

1. Grasp the key points of the problem first

Let’s say we stick to the basics: messaging, saving, consuming (publish-subscribe).

What are the challenges of these basic functions in a production environment? We can quickly think of the following:

1. How to ensure the performance of sending and receiving messages in high concurrency scenarios?

2. How to ensure high availability and reliability of messaging services?

3. How to ensure that the service is scalable horizontally and arbitrarily?

4. How to ensure that message storage is also horizontally scalable?

5. How to manage various metadata (such as nodes, topics, consumption relationships, etc.) in the cluster, and whether to consider the consistency of data?

As you can see, the three high points of a high concurrency scenario are all that you will encounter when designing an MQ, and the key is “how to meet the non-functional requirements of high performance, high reliability and so on”.

2. Overall design idea

To look at the overall architecture, there are three types of roles:

In addition, after further refining the core process of “one release, one deposit, one consumption”, the relatively complete data flow is as follows:

Based on the above two figures, we can quickly identify the roles of three types of roles as follows:

It provides RPC interfaces for producers and consumers. It is responsible for storing, backing up and deleting messages, as well as maintaining consumption relationships.

2. Producer: One of the MQ clients that calls the RPC interface provided by the Broker to send messages.

3. Consumer: Another client of MQ that calls the RPC interface provided by the Broker to receive messages and complete the Consumer confirmation.

3. Detailed design

Next, we will discuss some specific technical difficulties and feasible solutions.

Difficulty 1: RPC communication

Communication between brokers, producers and consumers is addressed. If you do not duplicate the wheel, you can directly use the mature RPC framework Dubbo or Thrift implementation, so there is no need to consider a series of problems such as service registration and discovery, load balancing, communication protocol, serialization and so on.

Of course, you can also use Netty for the underlying communication, use Zookeeper, Euraka, etc. as the registry, and then customize a new communication protocol (like Kafka), or implement it based on a standardized MQ protocol like AMQP (like RabbitMQ). Compared with RPC framework, this scheme has more customization capability and optimization space.

Difficulty 2: High availability design

High availability involves two main aspects: high availability of Broker services and high availability of storage solutions. We can break it down.

The high availability of Broker services is ensured by the fact that the Broker can scale horizontally to cluster deployment, and further through automatic service registration and discovery, load balancing, timeout retry mechanisms, and ack mechanisms for sending and consuming messages.

There are two ideas for high availability of storage solutions: 1) Refer to Kafka’s partition + multiple copy pattern, but consider data replication and consistency solutions (Zab, Raft, etc.) in distributed scenarios and implement automatic failover; 2) You can also use mainstream DB, distributed file systems, and KV systems with persistence capabilities, all of which have their own high availability solutions.

Difficulty 3: Storage design

The message storage scheme is a core part of MQ, and reliability has been discussed in high availability design, or directly in memory or distributed caching if reliability requirements are not high. How to ensure the high performance of storage? The deciding factor in this problem lies in the design of the storage structure.

The current mainstream schemes are: Additional writing log files (data) + index files the way mainstream (many open source MQ) are this way, can consider index of dense or sparse index on index design, news can use jump table, two copies of search, etc., can also through the pages of the operating system cache, zero copy technique to improve the disk file read and write performance.

If you’re not looking for high performance, consider an off-the-shelf distributed file system, KV storage, or database solution.

Difficulty 4: consumption relationship management

To support publish-subscribe broadcasting, the Broker needs to know which consumers are subscribed to each topic and deliver messages based on this relationship. Because brokers are clustered, consumption relationships are usually maintained on common storage, which can be managed and notified of changes based on configuration centers such as Zookeeper and Apollo.

Difficulty 5: High-performance design

The high performance of storage has already been covered, but there are other ways to further optimize performance. For example, Reactor network IO model, design of business thread pool, batch sending at the production end, asynchronous brushing at the Broker end, batch pulling at the consumer end, etc.

4.3 summary

To sum up, answer the question: How do you design AN MQ?

1. Start with functional requirements (sending and receiving messages) and non-functional requirements (high performance, high availability, high scalability, etc.).

2. Functional requirements are not the point, just the ability to cover the basics of MQ, with advanced features such as delayed messages, transaction messages, and retry queues just icing on the cake.

3, the most core is: can combine functional requirements, clear the overall data flow, and then follow this thinking to consider how to meet the non-functional demands, this is the technical difficulty.

Write the 05 at the end

This article explains the evolution of the messaging model, which is the core theoretical foundation of MQ, from the very nature of MQ. This makes it easier to understand the various new terms and application scenarios for MQ.

Finally by answering: How do you design an MQ? The goal is to give you a clear picture of the core components and technical challenges of MQ. In addition, with the answer to this question in mind, learning about specific messaging middleware such as Kafka and RocketMQ will become more focused.

I hope you can learn from it. If you have any comments or suggestions, please leave comments in the comments section. Kafka is next in the Understand MQ series, and we’ll see you next time!


About the author: 985 master, former Amazon engineer, now dachang technical director

Welcome to pay attention to my personal public number: Wuge Ramble IT, wonderful original constantly!