background

The business line of the author was initially divided into three services. Due to the relatively simple complexity of the business in the early stage, the three business services can complete the business functions independently.

With the product iteration, more and more business functions were gradually faced with problems such as high concurrency, business decoupling, distributed transactions, etc. Therefore, RocketMQ messaging middleware was introduced to better deal with business after internal discussion of the team.

As the internal business line deployment of the company is independent of each other, our business line is also in urgent need of introducing RocketMQ, so we plan to build a set of highly available RocketMQ cluster by ourselves. Meanwhile, the self-built RocketMQ cluster needs the following features:

  • High availability
  • High concurrency
  • scalable
  • Huge amounts of information

Name service (NameServer)

The first step is to make NameServer highly available. It is planned to deploy NamseServer on three machines, which can fully ensure the availability of the cluster. Even if two machines are down, the normal use of the cluster can be guaranteed. RocketMQ will be stable.

NameServer design is mutually independent, any NameServer can run independently, without any communication with other machines. Each NameServer has complete cluster routing information, including information about all Broker nodes, our data, and so on. So as long as any NamseServer survives, RocketMQ information can be kept running without failure.

Broker cluster deployment architecture

Before we started RocketMQ, we also did some research on the clustering solutions that are currently supported by RocketMQ. There are four clustering solutions that are currently supported by RocketMQ:

  • Multi-master mode: A cluster has no Slave but all masters, for example, two or three masters
  • Multi-master multi-Slave mode – Asynchronous replication: Each Master is configured with a Slave and multiple master-slaves are deployed. The HA uses asynchronous replication and the Master has a short message delay (millisecond).
  • Multi-master multi-Slave mode – Synchronous dual-write: Each Master has a Slave and multiple master-slave pairs are configured. HA uses synchronous dual-write mode. That is, the HA returns a success message to the application only when both the Master and Slave servers are successfully written
  • Dledger deployment: Each Master is configured with two slaves to form a Dledger Group. There can be multiple Dledger groups. Dledger implements Master election

More than the Master model

All nodes in a RocketMQ cluster are Master nodes, and each Master node has no Slave nodes.

The advantages and disadvantages of this model are as follows:

  • Advantages: simple configuration, no impact on applications when a single Master breaks down or is restarted for maintenance. When disks are configured as RAID10, the RAID10 disks are reliable, and messages are not lost even when the machine breaks down and cannot be recovered. (a small number of messages are lost during asynchronous disk flushing, but no messages are lost during synchronous disk flushing.)
  • Disadvantages: When a single machine is down, messages that are not consumed on that machine cannot be subscribed until the machine is restored. Message real-time is affected.

Multi-master Multi-SALve – Asynchronous replication mode

Each Master is configured with a Slave, and there are multiple master-slave pairs. The HA uses asynchronous replication mode, and the Master has a short message delay (milliseconds).

The advantages and disadvantages of this model are as follows:

  • Advantages: Even if the disk is damaged, the message loss is very small and the real-time performance of the message will not be affected. At the same time, after the Master breaks down, consumers can still consume from the Slave, and this process is transparent to the application, without manual intervention, and the performance is almost the same as the multi-master mode.
  • Disadvantages: A small number of messages will be lost if the Master machine breaks down and the disk is damaged.

Multi-master multi-SALve – Synchronous dual-write mode

A Slave is configured for each Master. Multiple master-slave pairs are available. HA uses synchronous dual-write mode

The advantages and disadvantages of this model are as follows:

  • Advantages: There is no single point of failure for data and services. When the Master is down, there is no message delay, and service availability and data availability are very high.
  • Disadvantages: The performance is slightly lower than that of the asynchronous replication mode (about 10% lower), and the RT to send a single message is slightly higher. In addition, when the current version of the active node breaks down, the standby node cannot be automatically switched to the host.

Dledger mode

RocketMQ prior to 4.5 was deployed on a master-slave architecture to ensure data loss and availability to a certain extent.

However, this approach has obvious drawbacks. The biggest problem is that when the Master Broker dies, there is no way for the Slave Broker to automatically switch to the new Master Broker. Manually changing the configuration to set Slave Broker to Master Broker and restarting the machine can be cumbersome.

During the on-hand operation and maintenance, the system may become unavailable.

Using Dledger technology requires at least three brokers, one Master and two slaves, so that three brokers can form a Group, that is, three brokers can run in groups. Once the Master is down, Dledger can elect one of the two remaining brokers to continue providing services.

Overall architecture: high availability, high concurrency, scalability, massive messages

After the comparison of the above four clustering schemes, the final logical deployment diagram of Dledger mode is determined as follows:

The dotted box in the figure above represents a Dledger Group.

High availability

In the extreme case of three NameServer, the availability of the cluster is ensured, and the failure of any two NameServer does not affect the overall use of information.

In the figure above, each Master Broker has two Slave brokers to ensure availability. For example, if the Master Broker goes down in the same Dledger Group, Dledger will do a line vote to promote the remaining nodes to Master Brokers.

High concurrency

If 100,000 messages are written per second for a Topic, you can add Master brokers and then each of the 100,000 messages will be assigned to different Master brokers. If there are 5 Master brokers, each Broker will carry 20,000 messages.

scalable

If the number of messages increases, the need to store a larger number and the highest concurrency, it is perfectly possible to add brokers to scale the cluster linearly.

Huge amounts of information

Data is distributed, and the data of each Topic is distributed among different brokers. If more data needs to be stored, you only need to add a Master Broker.

Welcome to the official account: architecture digest, to get exclusive 120G of free learning resources to help your architect learning road!

Public account background replyarch028Access to information: