This post was first posted by Yanglbme on GitHub’s Doocs (stars over 30K). Project address: github.com/doocs/advan…
The interview questions
How can message queues be highly available?
Interviewer psychoanalysis
If someone asks you about MQ, high availability is a must. As mentioned in the previous section, MQ can lead to reduced system availability. So as long as you use MQ, some of the next points to ask are bound to revolve around how the shortcomings of MQ can be addressed.
If you foolishly use MQ and never think about any of the questions, you will be in a bad situation and the impression of the interviewer will be that you will simply use the technology without thinking about it. If you recruit such students to do an ordinary younger brother within 20K salary is ok, if you do a salary of 20K + high, it will be miserable, let you design a system, there must be a pile of pits, accidents the company will suffer losses, the team together on the back.
Analysis of interview questions
This is a good question to ask, because I can’t ask you how high availability Kafka is guaranteed. How to ensure the high availability of ActiveMQ? This is a bad question for an interviewer who probably uses RabbitMQ and has never used Kafka before. I thought that was a sign of difficulty.
So skilled interviewers are asking how can HIGH availability of MQ be guaranteed? So which MQ you have used, you can say what you understand about the high availability of that MQ.
1. High availability of RabbitMQ
RabbitMQ is typically a master-slave (non-distributed) approach to high availability, so we will use RabbitMQ as an example to explain how the first MQ high availability can be implemented.
RabbitMQ has three modes: single-machine mode, common cluster mode, and mirrored cluster mode.
Stand-alone mode
Single player mode, is Demo level, generally is your local start to play, no one production with single player mode.
Common Cluster mode (no high availability)
Normal cluster mode, which means that multiple RabbitMQ instances are started on multiple machines, one on each machine. The queue you create will only be placed on one RabbitMQ instance, but each instance synchronizes the metadata of the queue (metadata can be thought of as configuration information about the queue, which can be used to find the instance of the queue). When you consume, in fact, if you’re connected to another instance, that instance will pull data from the instance where the queue is.
This way is really troublesome, and it’s not very good, it’s not distributed, it’s just a normal cluster. Because this causes you to either have consumers connect randomly one instance at a time and pull data, or to have fixed connections to instances of that queue that consume data, which has pull overhead, which leads to single-instance performance bottlenecks.
And if the instance in which the queue is placed goes down, then other instances will not be able to pull from that instance. If you enable message persistence and let RabbitMQ store messages, they will not be lost until the instance is restored.
So it’s a bit awkward, there’s no such thing as high availability, it’s about throughput, it’s about having multiple nodes in the cluster servicing reads and writes to a queue.
Mirroring Cluster mode (High Availability)
This is what is called the high availability mode of RabbitMQ. Unlike normal clustering, in mirrored clustering, the queue you create, both metadata and messages, will exist on multiple instances, that is, each RabbitMQ node will have a full mirror of the queue, meaning all the data in the queue. And then every time you write a message to a queue, it automatically synchronizes the message to multiple instances of the queue.
So how to enable this mirror cluster mode? RabbitMQ has a nice admin console that creates a policy in the background. This policy is mirrored cluster mode and can be specified to synchronize data to all nodes or to a specified number of nodes. When creating a queue again, apply this policy. Data is automatically synchronized to other nodes.
In this case, the advantage is that if any of your machines goes down, it doesn’t matter, the other machines (nodes) still contain the complete data of this queue, and other consumers can go to the other nodes to consume data. The downside is that, first of all, the performance overhead is too high. Messages need to be synchronized to all machines, resulting in heavy network bandwidth pressure and consumption. Second, it’s not distributed, it’s not scalable, so if you add machines to a queue that’s heavily loaded, you add machines that also contain all the data in that queue, there’s no way to linearly scale your queue. What would you do if the queue was so large that the machine could no longer accommodate it?
2. High availability of Kafka
A basic architectural understanding of Kafka is that it consists of multiple brokers, each of which is a node. You create a topic that can be divided into multiple partitions, each of which can reside on a different broker, and each of which holds a portion of the data.
This is a natural distributed message queue, meaning that the data for a topic is distributed across multiple machines, with each machine hosting a portion of the data.
In fact, something like RabbmitMQ is not a distributed message queue, it’s just a traditional message queue that provides some clustering, High Availability (HA) mechanisms, because no matter how you play it, RabbitMQ a queue of data is stored on a single node, the mirror cluster, and each node in the queue is the complete data.
Prior to Kafka 0.8, there was no HA mechanism. When any broker went down, partitions on that broker became invalid, unwritable and unreadable, and there was no high availability.
For example, let’s say we create a topic and specify three partitions on three machines. However, if the second machine goes down, 1/3 of the data on this topic is lost, so it is not highly available.
After Kafka 0.8, HA mechanism is provided, namely replica replica mechanism. The data of each partition will be synchronized to other machines to form multiple replica copies. All replicas elect a leader, so production and consumption deal with the leader, and the other replicas are followers. On writes, the leader synchronizes data to all followers, and on reads, the leader simply reads the data. Can only read and write to leader? Simply, if you can read and write to each follower at will, then you have to worry about data consistency, because the complexity of the system is so high that problems can easily occur. Kafka evenly distributes all replicas of a partition to different machines to improve fault tolerance.
In this way, there is what is called high availability, because if a broker goes down, it doesn’t matter, because partitions on that broker have copies on other machines. If the broken broker has a partition leader, a new leader will be elected from the followers. This is called high availability.
When the data is written, the producer writes to the leader, who then writes the data to the local disk, and the other followers actively pull the data from the Leader themselves. Once all the followers have synchronized their data, they send an ACK to the leader, who returns a write success message to the producer after receiving an ACK from all the followers. (Of course, this is just one pattern, and you can tweak this behavior)
When consuming a message, it will only be read from the leader, but will only be read by the consumer if a message has been ack successfully synchronized by all followers.
This gives you a sense of how Kafka ensures high availability, right? Don’t know nothing, you can also draw pictures for the interviewer. If the interviewer is really a Kafka expert and digs deep, you can only say sorry, you haven’t done too much research.
Welcome to follow my wechat public account “Doocs Open Source Community” and push original technical articles as soon as possible.