Message-oriented middleware selection analysis: Kafka vs. RabbitMQ

The author | zhong-hua zhu

Edit | Emily

One, foreword

Message queue middleware (message middleware for short) refers to the use of efficient and reliable messaging mechanism to carry out platform-independent data communication, and based on data communication to carry out distributed system integration. By providing message passing and message queuing model, it can provide application decoupling, elastic scaling, redundant storage, traffic peak clipping, asynchronous communication, data synchronization and other functions in distributed environment. As an important component in distributed system architecture, it has a pivotal position.

At present, there are a variety of open source message middleware, many of which are familiar to us, such as ActiveMQ, RabbitMQ, Kafka, RocketMQ, ZeroMQ and so on. No matter which one you choose, it will be useful in the wrong place, after all, it is not tailored for you. Some large manufacturers have accumulated certain experience in the long-term use process, and their usage scenarios of message queues are relatively stable and solidifying. Or, the current messaging middleware on the market cannot meet their own needs, and they have enough energy and manpower, so they choose to develop a customized messaging middleware for themselves. However, most companies do not choose to duplicate the wheel, so it is important to choose a message middleware that is suitable for you. Even if the former, it will go through such a selection process before developing stable and reliable related products.

To introduce message-oriented middleware into the overall architecture, many factors must be considered, such as cost and benefit issues. How to achieve the optimal cost performance? Although there are many kinds of message-oriented middleware, each of them has its own focus. It is undoubtedly the best way to choose the right one and to maximize the strengths and avoid the weaknesses. If you don’t know what to do about it, here’s a guide.

Please pay attention to the wechat public account “AI Front”, (ID: AI-front)

Brief description of various message queues

ActiveMQ is a message-oriented middleware developed by Apache and written by Java language, which is completely based on JMS1.1 specification. It provides efficient, extensible, stable and secure enterprise-level message communication for applications. However, due to historical reasons, the market share of message-oriented middleware is not as large as that of the following three kinds. Its latest architecture is named Apollo, known as the next generation ActiveMQ, and interested students can learn about it.

RabbitMQ is a message middleware based on the AMQP protocol implemented by Erlang. It originated in financial systems and is used to store and forward messages in distributed systems. RabbitMQ has become more and more popular today due to its outstanding performance in reliability, usability, extensibility and functionality.

Kafka was originally developed by LinkedIn in Scala as a distributed, multi-partitioned, multi-replica zooKeeper coordinated distributed messaging system, which has been donated to the Apache Foundation. It is a high throughput distributed publish and subscribe messaging system widely used for its horizontal scalability and high throughput. More and more open source distributed processing systems such as Cloudera, Apache Storm, Spark and Flink support Kafka integration.

RocketMQ is alibaba’s open source messaging middleware, which has been donated to the Apache Foundation. It is developed by The Java language, featuring high throughput, high availability and suitable for large-scale distributed system applications. It has experienced the baptism of Double 11, and its strength cannot be ignored.

ZeroMQ claims to be the fastest message queue in history and is based on THE C language. ZeroMQ is a messaging queue library, can be in a multi-threaded, elastic scaling between the kernel and host, although most of the time we used to go into the family of the message queue, but in front of it and a few the distinction that having essence, ZeroMQ itself is not a message queue server, more like a set of underlying network communication library, Add a layer of encapsulation to the existing Socket API.

At present, there are many message-oriented middleware in the market, such as PhxQueue, CMQ and CKafka of Tencent, as well as NSQ based on Go language. Sometimes, people also regard products like Redis as one kind of message-oriented middleware. Of course, they are all excellent, but this paper cannot be exhaustive. RabbitMQ and Kafka will be selected as two typical message-oriented middleware for analysis, and strive to explain the key points in the selection of message-oriented middleware from a fair and just standpoint.

3. Overview of key points of selection

Whether a message-oriented middleware meets the requirements needs to be examined from several dimensions, the first being the functional dimension, which directly determines whether you can achieve the maximum out of the box, thereby shortening the project cycle, reducing the cost, etc. If a message-oriented middleware does not perform as well as expected, secondary development is required, which can increase the technical difficulty, complexity, and duration of the project.

Feature dimension

The functional dimension can be divided into multiple sub-dimensions, which can be roughly divided into the following:

Priority queue

A priority queue is different from a first-in, first-out queue. Messages with a higher priority have the privilege of being consumed first, which guarantees different message levels for the downstream. However, there is a prerequisite for this priority: If the consumer is consuming faster than the producer, and there is no message backlog in the messaging middleware server (commonly referred to simply as the Broker), it makes little sense to prioritize the messages that are sent because the consumer consumes the message as soon as the producer sends it. This is equivalent to having at most one message in the Broker, and priority is meaningless for individual messages.

Delays in the queue

When you shop online, have you ever been prompted to “order automatically cancelled if payment is not made within 30 minutes”? This is a typical application scenario for deferred queuing. The delay queue stores the corresponding delayed message. The so-called “delayed message” means that after the message is sent, consumers do not want to get the message immediately, but wait for a specific time before they can get the message for consumption. Delay queues are generally divided into two types: message-based delay and queue-based delay. Message-based latency means that each message is set to a different delay time, so that every time a new message enters the queue, it is reordered by the delay time, which of course has a significant impact on performance. In practical applications, queue-based delay is mostly used. Queues of different delay levels are set, such as 5s, 10S, 30s, 1min, 5mins, 10mins, etc. Messages in each queue have the same delay time, which avoids the performance pain of delayed sorting. Timeout messages can be delivered by certain scanning policies, such as timing.

Dead-letter queue

For some reason, the message cannot be delivered correctly. To ensure that the message will not be discarded, it is usually placed in a queue with a special role. This queue is usually called a dead letter queue. Corresponding to this and the concept of a “queue”, just think, if the consumer is when the consumer the exception that would not have to confirm this time consumption (Ack), and then after the operation of the rollback message message will always be on the top of the queue, and then be processed and rollback, causes the queue in a infinite loop. To solve this problem, you can set up a fallback queue for each queue, which, along with the dead-letter queue, provides a mechanism for handling exceptions. In practice, the role of rollback queues can be played by dead-letter queues and retry queues.

Retry queue

A retry queue is a fallback queue. When a message fails to be consumed by a consumer, the message is rolled back to the Broker to prevent it from being lost. Different from the rollback queue, the retry queue is generally divided into multiple retry levels. Each retry level also sets the redelivery delay. The more retries, the greater the delivery delay. For example, a message fails to be consumed for the first time and is put into retry queue Q1. The redelivery delay of Q1 is 5s, and the message is redelivered after 5s. If the message fails to be consumed again, it is put into Q2, which has a redelivery delay of 10 seconds. The message is redelivered after 10 seconds. In this way, the more retries it takes, the longer it will take to repost, so you need to set an upper limit. If the number of reposts exceeds the number of reposts, the queue will be dead letter. Retry queue and delay queue have something in common in that the delay level needs to be set. The difference between them is that the delay queue action is triggered internally, while the retry queue action is triggered externally by the consumer. The delay queue works once, while the scope of the retry queue is passed backwards.

Consumption patterns

Consumption mode is divided into push mode and pull mode. Push mode refers to the mode in which the Broker actively pushes messages to the consumer, which has good real-time performance, but requires a certain flow mechanism to ensure that the messages pushed by the server will not overwhelm the consumer. In the pull mode, the consumer takes the initiative to request the Broker to pull (usually timing or quantitative) messages. The real-time performance is worse than that of the push mode, but the amount of messages pulled can be controlled according to its own processing capacity.

Radio consumption

Messages generally have two delivery modes: P2P (point-to-point) and publish/subscribe (Pub/Sub). In the point-to-point model, once a message is consumed, it is not stored in the queue, so it is impossible for a message consumer to consume a message that has already been consumed. Although queues can support multiple consumers, a message will only be consumed by one consumer. The publish subscribe pattern defines how to publish and subscribe to messages to a content node, called a topic, which can be thought of as a mediation for message delivery, with a publisher publishing messages to a topic from which message subscribers subscribe. Topic enables subscribers and publishers of messages to remain independent of each other, and does not need to contact each other to ensure the transmission of messages. Publish/subscribe mode is adopted in one-to-many broadcast of messages. RabbitMQ is a typical point-to-point model, whereas Kafka is a typical publish-subscribe model. But RabbitMQ can be broadcast with publish-subscribe switch types, and Kafka can be broadcast with point-to-point consumption. You can think of a consumer group as a queue. In contrast, Kafka has much stronger support for broadcast consumption than RabbitMQ because of message tracing.

Message back

Typically, a message is processed after it is consumed and cannot be consumed again. Message backtracking, on the other hand, means that a message can be consumed after it has been consumed. For messages, often face the problem is “lost” news, as for the loss as a result of the message middleware defect is real or lost due to the misuse of use party generally difficult to trace, if message middleware itself have news back function, can pass back consumption emersion “lost” message and then find out the source of the problem lies. Message backtracking can be used for more than this, including index recovery, local cache reconstruction, and some business compensation schemes.

Message accumulation + persistence

Traffic peak clipping is a very important function of message-oriented middleware, which actually benefits from its message stacking capability. In a sense, if a message-oriented middleware does not have the ability to stack messages, it cannot be considered a qualified message-oriented middleware. Message heap can be stored and disk heap. RabbitMQ is typically a heap in memory, but this is not always the case, either paging to disk (which affects throughput) or persisting messages directly to disk using lazy queues. Kafka is a typical disk heap where all messages are stored on a disk. In general, the capacity of disk is much larger than the capacity of memory, and the capacity of disk heap is the size of the entire disk. From another point of view, message heap also provides redundant storage for message middleware. Citing the Case of the New York Times, which uses Kafka directly as a storage system.

Message tracking

It is no stranger to link tracing in distributed architecture systems. For message middleware, message link tracing (hereinafter referred to as message tracing) is also important. The most popular understanding of message tracking is to know where a message is coming from, where it exists, and where it is going. Based on this function, we can carry out link tracing service for messages sent or consumed, and then quickly locate and troubleshoot problems.

The message filter

Message filtering refers to providing downstream users with specified categories of messages according to predefined filtering rules. In the case of Kafka, it is entirely possible to send messages of different categories to different topics, thereby enabling some sort of message filtering, or Kafka can sort messages within the same topic by partition. However, in a more strict sense, message filtering should be carried out according to certain filtering rules and in certain ways. Using Kafka as an example, messages can be filtered using the ConsumerInterceptor interface provided by the client or the Filter function of Kafka Stream.

multi-tenant

Also known as multiple lease technology, it is a software architecture technology that enables multiple users to share the same system or program components while still ensuring data isolation between users. RabbitMQ can support multi-tenant technology, with each tenant represented as a vhost, which is essentially a small independent RabbitMQ server with its own queues, switches, bindings, etc., and its own privileges. A Vhost is like a virtual machine in a physical machine. It provides logical separation between instances, allows data to be stored securely for different programs, separates many clients from each other in RabbitMQ, and avoids naming conflicts such as queues and switches.

Multi-protocol support

Messages are carriers of information. In order for both producers and consumers to understand the information they carry (producers need to know how to construct messages and consumers need to know how to parse messages), they need to describe messages in a unified format, which is called message protocol. A valid message must have some form, and a message without a form is meaningless. General message-level protocols include AMQP, MQTT, STOMP, XMPP, etc. (JMS in the messaging field is more of a specification than a protocol). The more supported protocols, the wider their application scope and stronger universality will be. RabbitMQ’s ability to support THE MQTT protocol, for example, gives it a place in iot applications. There are also message-oriented middleware that run on their own proprietary protocols, such as Kafka.

Cross-language support

For many companies, there are multiple programming languages in the technology stack, such as C/C++, JAVA, Go, PHP, etc. Message middleware itself has the feature of application decoupling, and if it can further support multiple client languages, then the efficiency of this feature can be expanded. The level of support across languages can also be a proxy for the popularity of a message-oriented middleware.

Flow control

Flow control aims at the problem of speed mismatch between sender and receiver, and provides a speed matching service to suppress the sending rate so that the reading rate of the receiver’s application program can adapt to it. Common flow control methods include stop-and-wait, sliding Windows, and token buckets.

Message ordering

As the name implies, message sequentiality refers to ensuring that messages are ordered. A very common application scenario of this function is CDC (Change Data Chapture). For example, if the order of the binlog transmitted by MySQL is wrong, for example, it was originally multiplied by 1 and then multiplied by 2. After sending the wrong order, it is changed to multiplied by 2 and then multiplied by 1. It creates inconsistencies in the data.

Security mechanism

Two security mechanisms, authentication and permission control, have been added since Kafka 0.9. Identity authentication refers to the authentication between clients and servers, including the authentication between clients and brokers, between brokers, and between brokers and ZooKeeper. SSL and SASL authentication mechanisms are supported. Permission control refers to the permission control on read and write operations of clients, including the permission control on messages or Kafka cluster operations. Permission control is pluggable and supports integration with external authorization services. RabbitMQ also provides security mechanisms for authentication (TLS/SSL, SASL) and permission control (read and write operations).

Message idempotency

There are generally three delivery guarantees to ensure that a message is transmitted between the producer and the consumer: At most once, the message may be lost but never repeated; Messages are never lost, but may be repeated At least once. Every message will be transmitted Exactly once and only once. For most message-oriented middleware, only At most once and At least once transport guarantees are provided. For the third guarantee, it is difficult to guarantee message idempotency.

Since version 0.11 Kafka has introduced idempotence and transactions. The idempotence of Kafka refers to the idempotence of a single producer for a single partition and a single session. Transactions can be written atomically to multiple partitions, that is, messages written to multiple partitions are either all successful or all rolled back. Together, these two features make Kafka EOS (Exactly Once Semantic) capable.

However, if we want to consider the global idempotent, we also need to consider the upstream and downstream aspects comprehensively, that is, the related business level. Idempotent processing itself is also an important issue that needs to be considered at the business level. Taking the downstream consumer level as an example, it is possible for a consumer to consume a message and then fail to acknowledge the message and then have to consume the original message again after recovery. This type of message idempotent cannot be guaranteed by the messaging middleware level. If global idempotent is to be guaranteed, more external resources need to be introduced, such as order numbers as unique identifiers and a de-duplicating table downstream.

Transactional message

Transaction itself is a familiar term. A Transaction consists of all operations performed between a Begin Transaction and an End Transaction. There are a number of messaging middleware middleware that support transactions, both Kafka and RabbitMQ, but a transaction is a transaction in which a producer sends a message, either successfully or unsuccessfully. Message-oriented middleware can be used as a means to implement distributed transactions, but it does not provide the capability of globally distributed transactions.

The following table provides a summary comparison of the features of Kafka and RabbitMQ with additional explanations.

performance

Functional dimension is an important reference dimension in the selection of message-oriented middleware, but it is not the only one. Sometimes performance is more important than functionality, and performance and functionality are often incompatible. Kafka can degrade performance when it comes to idemidemics or transactions. RabbitMQ can also degrade performance when it comes to rabbitmq_tracing. The performance of messaging-oriented middleware generally refers to its throughput. Although RabbitMQ has an advantage over Kafka in terms of functionality, Kafka’s throughput is 1-2 orders of magnitude higher than RabbitMQ’s. Kafka’s stand-alone QPS can be maintained at 100,000 levels or even up to a million.

Throughput of message-oriented middleware is always limited at the hardware level. Take the network adapter bandwidth as an example. If the bandwidth of a single network adapter is 1Gbps, the message body size should not exceed (1Gb/8)/100W, which is equal to about 134B. In other words, if the message body size exceeds 134B, it is impossible to reach the million level of throughput. This calculation can also be applied to memory and disk.

As an important indicator of performance dimension, delay is often ignored in the field of message-oriented middleware, because generally the scenarios using message-oriented middleware do not have high requirements on timeliness. If timeliness is required, RPC can be completely used to achieve it. Message-oriented middleware has the ability of message stacking. The larger the message stacking is, the longer the end-to-end delay is. Meanwhile, delay queue is also a feature of some message-oriented middleware. So why worry about latency in message-oriented middleware? Message middleware can decoupling system, for a time delay lower message middleware, it can make the upstream producers after sending the message can be quickly return, also can let the consumer more quick access to news, in the absence of accumulation can let whole application level between the upstream and downstream linkage as a more efficient, Although it is not recommended to use message-oriented middleware in time-sensitive scenarios, if the use of message-oriented middleware is good in terms of latency, the performance of the overall system will be greatly improved.

Reliability + Availability

Message loss is a common problem when using message middleware, and message reliability behind it is also a key factor to measure the quality of message middleware. Especially in the field of financial payment, information reliability is particularly important. However, when it comes to reliability, it must be about availability. Note the difference between the two. Reliability of message-oriented middleware refers to the guarantee degree of message loss. The availability of message-oriented middleware is the percentage of time spent without failure, usually measured in nines.

In a narrow sense, distributed system architecture is the application implementation of the consistency protocol theory, which can also be traced back to the consistency protocol behind messaging middleware for message reliability and availability. Kafka uses a pacifica-like consistency protocol. It uses in-sync-Replica (ISR) to ensure synchronization between multiple replicas and supports strong consistency semantics (implemented through acks). RabbitMQ implements multiple copies and strong consistency semantics through mirrored ring queues. Multiple replicas ensure that the slave can be promoted as the new master to continue to provide services to ensure availability if the master node is down and abnormal. Kafka was originally designed for log processing, which gave the impression that data reliability was not required. However, as versions of Kafka have been improved, reliability has been greatly improved. See KIP101 for more information. RabbitMQ is currently the most popular financial payment service, while Kafka is the most popular log processing service, big data service, etc. With RabbitMQ’s performance improving and Kafka’s reliability increasing, we believe that both of us will be able to get a share of the pie.

Synchronous flush is an effective way to enhance the reliability of a component. Messaging-ware is no exception. Kafka and RabbitMQ both support synchronous flush, but I have some questions about synchronous flush: In the vast majority of cases, the reliability of a component should not be guaranteed by synchronous brushing, which is extremely costly, but by a multi-copy mechanism.

Another aspect to mention here is extensibility, which I narrow down to the dimension of availability. Extensibility of messaging middleware increases its availability and scope. For example, RabbitMQ supports multiple messaging protocols, and this is an extensibility implementation based on its plug-in. In terms of cluster deployment, thanks to Kafka’s horizontal scaling capability, it can basically achieve a linear capacity increase level. In the introduction to LinkedIn practice, Kafka clusters with more than a thousand devices are mentioned.

Operations management

In the process of using message-oriented middleware, there will inevitably be a variety of abnormal situations, both client side and server side, so how to timely and effective monitoring and repair. Business line traffic has peaks and valleys, especially in e-commerce, so how to conduct effective capacity assessment before, especially during the rush period? Kicks the power supply, the network cable is dug up and other events emerge in endlessly, how to effectively do the remote live? All of these are inseparable from the derivative of message-oriented middleware – operation and maintenance management.

O&m management can be further divided into application, audit, monitoring, alarm, management, disaster recovery, and deployment.

Application and audit are well understood. Resource management and control at the source can not only effectively correct the usage specifications of the application party, but also perform traffic statistics and traffic assessment through allocation and monitoring. Generally, application and audit are highly integrated with the internal system of the company, so it is not suitable for the use of open source products.

Monitoring and alarm are also easy to understand. Comprehensive monitoring of the use of message-oriented middleware can provide benchmark data for the system and coordinate with alarms when abnormal conditions are detected, so as to facilitate the rapid intervention of operation and maintenance and developers. In addition to general monitoring items (such as hardware, GC, etc.), message-oriented middleware also needs to pay attention to end-to-end delay, message auditing, message stacking, etc. The most legitimate monitoring and management tool for RabbitMQ is the Rabbitmq_management plugin. But there’s AppDynamics, Collectd, DataDog, Ganglia, Munin, Nagios, New Relic, Prometheus, Zenoss, and more. Kafka is no slacker in this regard, for example: Kafka Manager, Kafka Monitor, Kafka Offset Monitor, Burrow, Chaperone, Confluent Control Center Cruise, in particular, provides automated operations and maintenance.

Management tools are indispensable to capacity expansion, downgrade, version upgrade, cluster node deployment, and troubleshooting. A complete set of management tools can achieve twice the result with half the effort when encountering changes. Faults can be large or small, such as some application exceptions, machine power failure, network exceptions, disk damage and other single-machine faults. Multiple copies in a single-machine room can cope with these faults. The key is to replicate data efficiently. For Kafka, use MirrorMarker and uReplicator. For RabbitMQ, use Federation and the Shovel.

Community strength and ecological development

For the current popular programming languages, such as Java and Python, if you encounter some exceptions in the process of use, you can basically solve them with the help of search engines, because the more people use a product, the more pits it has stepped on, and the more corresponding solutions it has. The same applies to message-oriented middleware. If you choose a “less popular” type of message-oriented middleware, you may be able to use it well in some aspects, but you may be stuck with slow version updates, difficult issues and difficult community support. On the other hand, if you choose a “popular” message-oriented middleware, it can be updated quickly to make up for the shortcomings of the past, but also to adapt to the rapid development of technology to change some new features, which can put you “on the shoulders of giants.” In terms of operations management we mentioned that Both Kafka and RabbitMQ have a number of open source monitoring and management products, which are due to the rapid growth of their communities and ecosystems.

4. Discussion on errors of message-oriented middleware selection

Before choosing a message-oriented middleware, ask yourself one question: Do you really need one? Once you understand this, you can also ask yourself the following question: Do you need to maintain a set of messaging middleware yourself? In order to save money, many startups will choose to buy cloud services directly related to messaging middleware, focus on sending and receiving messages themselves, and outsource the rest.

A lot of people are tempted to build their own message-oriented middleware. You can build a simple wrapper around ArrayBlockingQueue in Java, or you can build a message-oriented middleware based on low-level storage packages such as files, databases, Redis, etc. Message-oriented middleware as a basic component is not as simple as imagined, and it also needs a set of supporting products to manage operation and maintenance of the entire ecosystem. There will also be handover problems. If the documents are not complete and the operation is not standardized, it will bring a nightmare experience to the newcomers. Is there really a need for self-research? If it is not the pressure of KPI, we can first consider the following two questions: 1. Are all the messaging middleware on the market really unable to meet the current business needs? 2. Does the team have sufficient ability, human resources, financial resources and energy to support self-research?

Many people refer to many comparative articles on the web when choosing message-oriented middleware, but their professionalism, rigor, and political stance are not proven, so they need to be viewed with suspicion. Articles that define a particular piece of message-oriented middleware as the best without any constraints or scenarios, and articles that compare functionality and performance without specifying the version and test environment of the message-oriented middleware, can be rejected.

Message-oriented middleware is like a pony crossing a river. The most important thing is to select the right one, which needs to fit its own business needs and technology serves business. Generally, it can be screened according to the six dimensions of function and performance mentioned in the previous section. The deeper decision is whether you can master it or not, and I’m going to point out that RabbitMQ is in routing and Kafka is in streaming, so knowing the basics is crucial to choosing the right messaging-oriented middleware.

The selection of message middleware should not blindly pursue performance or function, performance can be optimized, function can be developed twice. If there is a choice between functionality and performance, go with performance, because overall there is not as much room for performance optimization as there is for functionality expansion. But for long-term development, ecology is more important than performance or function.

A lot of times, it’s also easy to make a mistake about reliability: when it comes to finding a product that can guarantee that information is absolutely reliable, unfortunately there’s no such thing as perfection. To ensure the reliability of RabbitMQ messages as far as possible, it is not only dependent on the message middleware itself, but also on the upstream and downstream. It is necessary to ensure the reliability of RabbitMQ messages from the production side, the server side and the consumer side. This article analyzes RabbitMQ message reliability from these three dimensions.

Another consideration for message-oriented middleware selection is to try to fit the team’s stack architecture. Although there are no crappy message-oriented middleware and only crappy programmers, it is much easier for a c-stack team to dig deep into PhxQueue than Kafka written in Scala.

The path of message-oriented middleware is simple: there is no best message-oriented middleware, only the most appropriate one.

The authors introduce

Zhu zhonghua, author of RabbitMQ Field Guide, is mainly involved in messaging middleware development.

Please pay attention to the wechat public account “AI Front”, (ID: AI-front)