When to use RabbitMQ and when to use Apache Kafka

How do humans make decisions? In everyday life, emotions often short-circuit factors that lead to pulling the trigger on complex or overwhelming decisions. But it can’t be pure impulse to make complex decisions that have long-term consequences. High-performance people usually use circuit breakers, “instinct,” “intuition,” or other emotions, only once their surface consciousness, the subconscious mind absorbs all the facts needed to make a decision.

Today there are many messaging technologies, countless ESBs, and nearly 100 iPaaS vendors in the market. Naturally, this leads to questions about how to choose the right messaging technology for your needs – especially those who have invested in a particular choice. Do we choose wholesale? Do the right job with the right tools? Are we positioning the work at hand correctly to meet business needs? Given this, what is the right tool for me? To make matters worse, a thorough market analysis may never be completed, but due diligence is critical because of the average lifetime of integrated code.

This article is dedicated to giving the subconscious mind some fair consideration from today’s most modern and popular options: RabbitMQ and Apache Kafka. Each has its own origin story, design intent, use case, application, integration capabilities, and developer experience. For any piece of software, origins reveal the overall design intent and are a good starting point.

The origin of

RabbitMQ is a “traditional” message broker that implements multiple messaging protocols. Was the first open source messaging broker to achieve a reasonable level of functionality, client libraries, development tools, and quality documentation. RabbitMQ was originally developed to implement AMQP, an open routing protocol for message delivery with powerful routing capabilities. Although Java has messaging standards like JMS, it does not help distributed non-Java applications that need to deliver messages, which severely limits many integration solutions, microservices or single components. With the advent of AMQP, cross-language flexibility becomes a true open source messaging broker.

Apache Kafka was developed in Scala and was originally used on LinkedIn as a way to simplify Hadoop’s ability to extract messages from Apache Flume. Consume and produce data from multiple sources, writing separate data pipes for each source and target pair. Kafka helps standardize LinkedIn’s data pipeline and allows data to be taken out and stored once per system, making the pipeline (and operation) easier. Kafka is a common configuration of the Apache Software Foundation today. In particular, it integrates well with Apache Zookeeper and forms the backbone of Kafka distributed partitioning. Many people think Zookeeper is not that high in demand and that it gives Kafka the benefits of user clustering.

Architecture and design

RabbitMQ I is designed to be a general purpose messaging broker with several point-to-point variations, request/reply, and publish-subscribe communication styles. It uses an intelligent agent/dumb consumer model that focuses on messaging consumer consistency, acting as an agent to track the state of consumers and ensure that they consume at roughly the same rate. Well configured, mature, well supported (client libraries include Java,.NET, Node.js, Ruby, PHP, and many other languages), with dozens of plug-ins available to extend it to more use cases and integration scenarios.

Figure 15 – Simplified overall RabbitMQ architecture (source).

RabbitMQ communication can be synchronous or asynchronous as required. The pusher sends the message to the exchange and the consumer retrieves the message from the queue. The producers are decoupled by sending messages to the exchange area, ensuring that the producers do not bear the burden of hard-coded routing. RabbitMQ also provides some distributed deployment solutions that require all nodes to be able to resolve host names. It can be set up as a multi-node cluster for cluster federation, and does not rely on external services (although some cluster formation plug-ins can use AWS APIs, DNS, Consul, etc.).

Apache Kafka is designed for high-volume publish-subscribe message flows, which is meant to be durable, fast, and scalable. In its essence, Kafka provides persistent message storage, similar in some ways to a database, running in a cluster of servers that store streaming records in categories called topics.

Figure 11 – Global Apache Kafka architecture (1 theme, 1 partition, 4 replicators).source).

Each message contains a key, a value, and a timestamp. Almost in contrast to RabbitMQ, Kafka uses a stupid agent and uses intelligent consumers to read its buffers. Kafka does not attempt to track messages that are consumed by consumers and only keeps unread messages; Instead, Kafka keeps all messages for a set of times, and consumers are responsible for keeping track of their location (consumption status) in each log. Thus, with the right developers to create consumer code, Kafka can support a large number of consumers and retain a large amount of data with very little overhead. As shown above, Kafka does require external services to run – in this case, Apache Zookeeper, is generally considered necessary, understood, set up, and operated.

Requirements and use cases

Many developers began to explore messaging when they realized that they had to connect a lot of things together, and that other integration patterns, such as shared databases, were not feasible or too dangerous.

Apache Kafka describes it as a distributed streaming platform, but is better known as a persistent repository with good Hadoop/Spark support. The documentation is well done, discussing popular use cases such as site activity tracking, metrics, log aggregation, flow processing, event traceability, and commit logging. One of these use cases, which describes the message, can create some confusion. So let’s open it up a little bit and see more clearly which messaging scheme works best for Kafka, for example:

Streams from A to B have no complex routing, maximum throughput (100K/SEC +) and are delivered at least once in partitioned order.
When your application needs access to the stream history, deliver it at least once, in partition order. Kafka is a requirement for persistent message storage and the customer can get a “replay” stream of events, as opposed to more traditional message brokers, which remove a message from the queue once it is delivered.
When you have smart consumers who can reliably track their log offsets.
If your application requires an “infinite” queue.

RabbitMQ is a general-purpose messaging solution, typically used to allow web servers to respond quickly to requests without being forced to execute resource-heavy programs while users wait for results. It is also conducive to distributing messages to multiple consumers for consumption, or balancing the load between workers under high load conditions (20K +/ SEC). When your needs go beyond throughput, RabbitMQ has a lot to offer: features, reliable messaging, routing, federation, HA, security, administration tools and more. Let’s look at some of the best scenarios for RabbitMQ, such as:

Your application needs to be combined with existing protocols such as AMQP 0-9-1, STOMP, MQTT, and AMQP 1.0.
You need mature, easy-to-understand consistency guarantees for your message delivery.
Your application requires multiple types of peer-to-peer protocols, request/reply, and publish/subscribe messages.
Complex consumer routing, integrating multiple services/applications, using hard-to-master routing logic.
While integration with your existing IT infrastructure is important, RabbitMQ is better.

RabbitMQ can also effectively address several of the powerful Kafka use cases above, but with the help of additional software. RabbitMQ is often used in conjunction with Apache Cassandra, when applications need access to stream history, or with the LevelDB plugin, which requires “infinite” queues for applications, but neither comes with RabbitMQ.

For a more in-depth look at specific use cases for microservices, Kafka and RabbitMQ, check out the blog and read this short article by Fred Melo.

Development experience

RabbitMQ officially supports Java, Spring,.NET, PHP, Python, Ruby, JavaScript, Go, Elixir, Objective-C, Swift – along with many other client and development tools, via community plug-ins. The RabbitMQ client library is mature and well documented.

Apache Kafka has come a long way in this area, and while it is primarily a Java client, there is a growing community of directory open source clients, ecosystem projects, and an adapter SDK that allows you to build your own system integration. Most of the configuration is already set up, either through a.properties file or programmatically.

The popularity of these two options, among many other software vendors, has a strong influence on ensuring RabbitMQ and Kafka work properly through their technology.

Safety and operation

Security and operation are RabbitMQ’s strengths. The RabbitMQ management plug-in provides an HTTP API, a browser-based UI for administration and monitoring, plus CLI tools for operators. External tools, such as CollectD, Datadog, or New Relic, are also needed for long-term monitoring of data stores. RabbitMQ also provides apis and monitoring tools, auditing and application troubleshooting. In addition to TLS support, RabbitMQ comes with RBAC support via built-in data stores, LDAP or external HTTPS based vendors, and supports authentication using X509 certificates instead of username/password pairs. Additional authentication methods can be extended fairly directly using plug-ins.

These areas present a challenge to Apache Kafka. On the security side, the most recent version of Kafka 0.9 adds TLS, JAAS role-based access control, Kerberos/ Plain/SCRAM authentication, and the use of CLI to manage security policies. This is a big improvement over earlier versions, when you could only lock access at the network level, not for sharing or multiple users.

Kafka uses admin CLI consisting of shell scripts, property files, and especially formatted JSON files. Kafka broker, producer, and consumer emissions metrics through Yammer/JMX, but do not maintain any history, which in practice means using third-party monitoring systems. Using these tools, operations can manage partitions and themes, check consumer offset locations, and use the HA and FT capabilities provided by Apache Zookeeper for Kafka. For example, a 3-node Kafka cluster can function even if two nodes fail. However, if you want to support the maximum number of Zookeeper failures, you need 5 additional Zookeeper nodes. Zookeeper is a quorum based system that can only tolerate N/2+1 failures. This obviously should not coexist with Kafka nodes – therefore, to build a 3-node Kafka system, you need about eight servers. The operator considers ZK cluster attributes, both resource consumption and design, when inferring the availability of any Kafka system.

performance

Kafka is designed as follows: 100K/SEC performance is often a key driver for choosing Apache Kafka. Its implementation relies heavily on developers being able to write smart consumer code.

Of course, message rates per second are difficult to state and quantify because they depend on a lot, including your environment and hardware, the nature of your workload, and what kind of delivery guarantee you use (e.g Persistence is expensive, and mirroring is even more so), etc.

20K/ SEC is easy to achieve with a Rabbit queue, and in fact not too hard to achieve more, as long as there are not too many guarantees required. Support for queues, through a lightweight Erlang thread that collaboratively schedules on the local operating system thread pool — so, naturally, it becomes a key point or bottleneck for a single queue, never doing more work than can be done in a CPU cycle.

Increasing the information per second rate usually boils down to taking advantage of the available parallelism environment, for example by breaking traffic across multiple queues (and thus running different queues at the same time) through clever routing. When RabbitMQ achieves one million messages per second, it will usually be wise to slow down – but by using a lot of resources around 30 RabbitMQ nodes. Most RabbitMQ users enjoy the high performance provided by three to seven RabbitMQ nodes.

Making the Call

Some research on other advanced options on the market. If you want to go deeper with the most popular options, check out Nicolas Nannoni’s Master thesis, which features a comparison table in Section 4.4 (Page 39) that is still quite accurate two years later – worth a read.

Create stakeholder and business cycles whenever possible during research. Understanding business use cases is the single biggest factor in making the right choice for your situation. Then, if you’re a pop tech fan, your best bet is to sleep on it and let it sink in and let your instincts take over. You get the idea.

When to use RabbitMQ and when to use Apache Kafka

The origin of

Architecture and design

Requirements and use cases

Development experience

Safety and operation

performance

Making the Call

Related Posts

How do I learn to write an operating system (II) : operating system Bootloader

IoC Container (part 1)

Unheard of JVM low-level loading