Writing in the front

Many small partners go to big factory interview, almost will encounter some open questions, these open questions have no fixed answer, but it can really reflect the interviewer’s more real system design ability and technical foundation. If you have a perfect answer, then this kind of open-ended question will help you stand out from the crowd. Today, we are going to talk about, to a large factory interview, a common open topic: if you were asked to design a high concurrency messaging middleware, what would you do?

Knowledge points involved in messaging middleware

In order to design a messaging middleware with high concurrency, we must first understand what specific knowledge points are involved in messaging middleware. In general, designing a good message-oriented middleware requires at least the following requirements:

  • Producer-consumer model.
  • Supports distributed architectures.
  • High availability of data.
  • Message data is not lost.

Next, let’s talk about each of these technical points for message-oriented middleware.

Producer-consumer model

Many of you are familiar with the producer and consumer model. In short, message-oriented middleware enables other applications to produce and consume messages.

For the producer and consumer model, we have more problem points to consider. Next, I will guide you step by step to think about.

First, let’s consider this question: if the producer produces the message, how should the message middleware store the corresponding data? Store it in memory? Store on disk? Or do you store it both in memory and on disk?

If message data is stored in both memory and disk, what do we do with it? Does the producer post the message to the message-oriented middleware and we immediately write the data to disk? Or does the data reside in memory and then be flushed to disk at regular intervals? If it is flushed to disk every once in a while, then we have to consider disk file shard, that is, how many disk files do we need to split the message data into? You can’t put all your data in one disk file. If you need to split multiple disk files, what are the rules for partitioning?

These are all the questions we need to consider when designing a message-oriented middleware. But this is only a small part of the problem. If you want to stand out in an interview, there are also some important question points to note.

If the file is divided into multiple disk files according to certain rules, is it necessary to manage metadata to identify the specific messages of the data (for example, NameNode nodes in Hadoop store DataNode metadata information)? NameNode nodes can better manage datanodes by using metadata information. This metadata can include an offset of the message data or a unique ID of the message data.

After considering the storage of data, we also need to consider: how does the messaging middleware deliver the data to the corresponding consumer?

Another important question to consider when designing producers and consumers is: what are the consumption patterns that we use when designing message-oriented middleware? Will the data be distributed evenly to consumers? Or will the data be delivered to the consumer through some other rule?

Support distributed architecture

If we design messaging middleware, it will carry terabytes of data per day with high concurrency and high throughput writes. Here, we need to consider designing messaging middleware as a distributed architecture.

In the design of distributed architecture, we also need to consider the storage of large data into fragmented storage, data fragmentation and other operations.

In addition to this, there is another core issue to consider: the need to support automatic scaling operations for messaging middleware.

There is also whether to support data sharding, how to achieve the expansion of data sharding and automatic data load balancing migration.

High availability of data

The high availability of common Internet applications is achieved through local heap memory, distributed cache, and a copy of the data on different servers. In this case, the breakdown of any storage node does not affect the overall high availability. We can also use this idea when designing message-oriented middleware.

Message data is not lost

At this point, we need to provide a mechanism for manual ACK, that is, when the consumer has actually finished consuming the message, the message middleware returns a “processing completed” flag, and the message middleware deletes the corresponding processed message.

However, for refinement, we need two sets of ACK mechanisms:

  • One ACK corresponds to the production side. If no ACK message is received, the producer needs to send another message to ensure the success of the production message.
  • The other ACKS correspond to the consumer side. Once a message is consumed and processed successfully, an ACK must be returned to the messaging middleware before it can delete the message. Otherwise, if the consumer goes down, the message must be resent to other consumer instances to ensure that the message will be processed successfully.

Today, we are not talking about specific business points, but to consider the overall: if we implement a message middleware, we need to pay attention to the knowledge and expertise! If you have any questions, please leave a comment below, or add me to wechat: SUN_shine_LYz. I will pull you into the group to exchange technology, advance together, and be awesome together