Message queue middleware is one of the important components in distributed system, and it is indispensable in high concurrency system. It mainly solves application coupling, asynchronous message, traffic cutting and other problems. Achieve high-performance, highly available, scalable, and ultimately consistent architectures. The most used message queues are ActiveMQ, RabbitMQ, Kafka and RocketMQ. Today I will talk about the specific application scenarios and considerations of message queues in highly concurrent systems

What is a message queue

We can think of a message queue as a container for messages that we can pull out of the container for our own use when we need them.

Queues are a first-in, first-out data structure, so messages are consumed in order.

Why use message queues

In general, using message queues brings three benefits to our system:

  1. Asynchronous processing: Improves system performance and reduces response time
  2. Traffic peak clipping
  3. Apply decoupling to reduce system coupling

2.1. Asynchronous processing

Synchronous processing

Asynchronous processing

The result is returned immediately after the user’s request data is stored on the message queue. The system then consumes the message. After a user requests data to be written into the message queue, it is immediately returned to the user. However, the requested data may fail in subsequent operations such as service verification and database writing. After using the message queue for asynchronous processing, therefore, need to appropriate modification with business processes, such as user after submit orders, order data is written to the message queue, can’t immediately return user order submitted to success, real need in the order of the message queue consumer process after processing the order, even after the outbound, Then notify the user of the success of the order through email or SMS, so as to avoid trade disputes. This is similar to booking train tickets and movie tickets by mobile phone.

2.2 Flow peak cutting

Short, highly concurrent transaction messages are stored in message queues, and then the back-end services slowly consume them as they can, so as not to overwhelm the back-end services directly.

For example, in some e-commerce flash kill and promotion activities, the rational use of message queue can effectively resist the impact of a large number of orders flooding into the system at the beginning of the promotion activities. As shown below:

2.3 Application decoupling

Using message queues also reduces system coupling. We know that if there are no direct calls between modules, adding or modifying modules will have little impact on other modules, and the system will be more scalable.

Consider A scenario where system A sends data to systems B, C, and D via interface calls. What if system E also wants this data? What if system C now doesn’t need it? A system manager almost changed to crash……

In this scenario, system A is heavily coupled with various other chaotic systems. System A produces A critical piece of data, which many systems need to send. System A should always consider B, C, D, E four systems if the failure of what to do? Do you want to repost? Do you want to save the message? The hair is all white!

If MQ is used, system A generates A piece of data and sends it to MQ, which system needs the data to be consumed in MQ itself. If the new system needs data, it can be consumed directly from MQ; If a system no longer needs this data, simply unconsume the MQ message. In this way, system A does not need to consider who to send data to, does not need to maintain the code, and does not need to consider whether others call success, failure timeout, etc. As shown below:

The producer (client) sends the message to the message queue, and the receiver (server) processes the message. The consuming system can directly fetch the message from the message queue for consumption without coupling with other systems, which obviously improves the scalability of the system.

Message queues enable the publish-subscribe model to work, with a message sender (producer) publishing the message and one or more message recipients (consumers) subscribing to the message. From the picture above you can see the message sender (producers) and there is no direct coupling between the message receiver (consumer), the message sender sends the message to distributed message queue the end of the message processing and message recipients from distributed message queue to get the news for subsequent processing, do not need to know where the message come from. For new services, as long as you are interested in the message, you can subscribe the message, which has no impact on the original system and services, so as to achieve the scalability design of system services.

Some problems with using message queues

  • System availability degrades: System availability degrades to some extent. The more external dependencies a system introduces, the more likely it is to fail. Before joining MQ, we didn’t have to worry about message loss or MQ failure, but after MQ was introduced, we had to worry about how to make message queues highly available, otherwise MQ failure could cause the whole system to crash!
  • Increased system complexity: Since joining MQ, we need to ensure that messages are not re-consumed, deal with message loss, ensure that messages are delivered sequentially, and so on!
  • Data consistency problem: The above mentioned message queue can realize asynchrony, the asynchrony caused by message queue can improve the system response speed. But what if the true consumer of the message doesn’t consume the message correctly? This will lead to inconsistent data!

4. Comparison of common message queues

There are many MQ products in the market, the main ones are Kafka, ActiveMQ, RabbitMQ and RocketMQ, but which one should we use when making technology selection? Each MQ is not absolutely good or bad, but it depends on which scenario can be used to maximize its strengths and circumvent its weaknesses.

features ActiveMQ RabbitMQ RocketMQ Kafka
Single machine throughput Ten thousand, RocketMQ, Kafka is an order of magnitude lower With ActiveMQ Level 100,000, support high throughput Level 100,000, high throughput, generally with big data system to carry out real-time data calculation, log collection and other scenarios
The impact of number of topics on throughput Topics can be in the hundreds/thousands scale, with a small drop in throughput, which is one of RocketMQ’s great strengths, being able to support a large number of topics on the same machine Throughput drops dramatically from a few dozen to a few hundred topics, so Kafka tries to keep the number of topics to a minimum on the same machine. To support a large number of topics, you need to add more machine resources
timeliness Ms level Microsecond, which is a feature of RabbitMQ, with the lowest latency Ms level The delay is within ms class
availability High, based on master/slave architecture to achieve high availability With ActiveMQ Very high, distributed architecture Very high, distributed, multiple copies of one data, few machines down, no data loss, no unavailability
Message reliability There is a low probability of data loss Basic don’t throw After optimized parameter configuration, 0 can be lost With RocketMQ
Function support The MQ realm is extremely functional Based on Erlang development, concurrency is very strong, excellent performance, very low delay MQ function is more complete, or distributed, good scalability It mainly supports simple MQ functions and is widely used for real-time computing and log collection in the field of big data

Conclusion:

  • ActiveMQ: community is relatively mature, but ActiveMQ has poor performance and slow version iteration, so it is not recommended to use it.
  • RabbitMQ: Slightly inferior to Kafka and RocketMQ in terms of throughput, but based on Erlang, it has very high concurrency, extremely high performance and very low latency of microseconds. But because RabbitMQ is based on Erlang, few companies in China have the resources to do research and customization at the Erlang source level. Of the four message queues, RabbitMQ is your first choice if the business scenario is not too demanding for concurrency (hundreds of thousands, millions). Kafka is the industry standard for real-time computing, log collection and other scenarios in the field of big data. There is no problem with Kafka. The community is very active.
  • RocketMQ: Alibaba, Java is an open source project, the source code can be read directly, and then you can customize your own COMPANY’S MQ, and RocketMQ has alibaba’s actual business scenarios to test. The RocketMQ community is relatively active, RocketMQ has been donated to Apache, but GitHub is not very active, the documentation is relatively simple, and the interface is not according to the standard JMS specification. Some systems need to change a lot of code to migrate. If you are absolutely confident in your technology, RocketMQ is recommended. Otherwise, go back and use RabbitMQ. They have an active open source community.
  • Kafka: The features are obvious. It offers very few core features, but high throughput, ms latency, high availability and reliability, and distributed scalability. At the same time, Kafka should be able to support a small number of topics to ensure its extremely high throughput. The only disadvantage of Kafka is the possibility of repeated consumption of messages, which has a very slight impact on data accuracy. This is negligible in the big data world and log collection. This feature is natural for real-time big data computation and log collection.