This article has been added to the open source project: JavaGuide (a documentation project that covers the core knowledge that most Java programmers need to master, with a Star number approaching 16K). Address: github.com/Snailclimb/…

Mind mapping:





Message queue summary

Message queues are simple

“The RabbitMQ?” “Kafka?” “RocketMQ?” . In our daily learning and development process, we often hear the keyword message queue. I have also mentioned this concept in several of my articles. Whether you are an experienced user of message queues or a novice user of message queues, this article will take you through some of the basic theory of message queues. If you’re an old hand, you may have learned some important concepts about message queues that you didn’t notice before, but if you’re new, this article will help you open the door to message queues.

What is a message queue

We can think of a message queue as a container for messages that we can pull out for our own use when we need them. Message queue is an important component in distributed system. The main purpose of using message queue is to improve system performance, reduce peak load and reduce system coupling through asynchronous processing. The most popular message queues are ActiveMQ, RabbitMQ, Kafka and RocketMQ, which will be compared later.

In addition, we know that a Queue is a first-in, first-out data structure, so messages are consumed in order. For example, producers send messages 1,2,3… For consumers, there will be 1,2,3… In order to consume. However, occasionally messages may be consumed in the wrong order. For example, a message fails to be consumed or multiple consumers in a queue may cause messages to be consumed in the wrong order. We must ensure that messages are consumed in the correct order.

In addition to the above issue of message consumption order, with message queues, we also need to consider how to ensure that messages are not reused. How to ensure reliable transmission of messages (how to deal with message loss)? . And so on. So using message queues is not perfect, it also makes the system less available, more complex, and we need to ensure consistency.

Why use message queues

I see two main benefits of using message queues: 1. Improved system performance through asynchronous processing (peak shaving, reduced response time); 2. 2. Reduce system coupling. If you are asked this question in an interview, and you have already mentioned message queuing on your resume, it is recommended that you answer this question in conjunction with your own project.

Both chapters 4 and 7 of “Technical Architecture for Large Web sites” talk about how message queues can improve application performance and scalability.

(1) Improve system performance through asynchronous processing (peak clipping and response time reduction)




Improve system performance through asynchronous processing

As shown in the figure above, when the message queue server is not used, the user’s request data is written directly to the database. In the case of high concurrency, the database pressure increases dramatically, making the response time slow. However, after the message queue is used, the user’s request data is sent to the message queue and immediately returned, and then the consumer process of the message queue obtains the data from the message queue and asynchronously writes the data to the database. Because message queue servers process faster than databases (and message queues scale better than databases), response times improve dramatically.

Through the above analysis, we can conclude that message queue has a good function of peak clipping, that is, through asynchronous processing, the transaction messages generated in a short time with high concurrency are stored in the message queue, so as to reduce the peak of concurrent transactions. For example, in some of the second kill and promotion activities of e-commerce, the rational use of message queue can effectively resist the impact of a large number of orders flooding into the system at the beginning of the promotion. As shown below:




The proper use of message queues can effectively prevent the system from being flooded with orders at the beginning of a promotional campaign

After a user requests data to be written into the message queue, it is immediately returned to the user. However, the requested data may fail in subsequent operations such as service verification and database writing. So after using the message queue for asynchronous processing, it is necessary to properly modify business processes, such as user after submit orders, order data is written to the message queue, can’t immediately return user order submitted to success, real need in the order of the message queue consumer process after processing the order, even after the outbound, Then notify the user of the success of the order through email or SMS, so as to avoid trade disputes. This is similar to booking train tickets and movie tickets by mobile phone.

(2) Reduce the system coupling

We know that if there are no direct calls between modules, adding or modifying modules will have little impact on other modules, and the system will be more scalable.

Our most common event-driven architecture is similar to the producer-consumer pattern and is commonly implemented in large web sites using message queues. As shown below:




Use message queues to implement event-driven architecture

Message queues enable the publish-subscribe model to work, with a message sender (producer) publishing the message and one or more message recipients (consumers) subscribing to the message. From the picture above you can see the message sender (producers) and there is no direct coupling between the message receiver (consumer), the message sender sends the message to distributed message queue the end of the message processing and message recipients from distributed message queue to get the news for subsequent processing, do not need to know where the message come from. For new services, as long as you are interested in the message, you can subscribe to the message, without any impact on the original system and business, so as to achieve the scalability design of the website business.

The message recipient filters, processes, and wraps the message, constructs a new message type, and sends the message again, waiting for other message recipients to subscribe to the message. So an event-based (message object) driven business architecture can be a series of processes.

In addition, in order to avoid message loss caused by message queue server breakdown, the message that is successfully sent to the message queue is stored on the message producer server, and the message is deleted after the message is processed by the consumer server. After the message queue server goes down, the producer server selects other servers in the distributed message queue server cluster to publish messages.

Note: Do not assume that message queues only work in publish-subscribe mode, except that they work in publish-subscribe mode in a particular business context of decoupling. In addition to the publish-subscribe model, there is a peer-to-peer subscription model (one consumer per message), and the publish-subscribe model is the one we use most often. In addition, these two message models are provided by JMS, and the AMQP protocol provides five more message models.

Some problems with using message queues

  • System availability degrades: System availability degrades to some extent. Why? Before joining MQ, you don’t have to worry about message loss or MQ failure, etc., but after you introduce MQ, you do!
  • Increased system complexity: With MQ, you need to ensure that messages are not re-consumed, handle message loss, ensure that messages are delivered sequentially, and so on!
  • Consistency issues: I mentioned above that message queues can be asynchronous, and the asynchrony provided by message queues can actually improve system response times. But what if the true consumer of the message doesn’t consume the message correctly? This will lead to inconsistent data!

Four JMS VS closer

JMS 4.1

4.4.1 JMS profile

The JAVA Message Service (JMS) is a JAVA Message Service. JMS clients can transfer messages asynchronously. The JMS (JAVA Message Service) API is a standard or specification for Message services that allows application components to create, send, receive, and read messages based on the JAVA EE platform. It makes distributed communication less coupled, message service more reliable and asynchronous.

ActiveMQ is implemented based on the JMS specification.

4.1.2 JMS Two message models

① P2P model




Point to point (P2P) model

Using queues as message communication carriers; To satisfy the producer-consumer pattern, a message can only be consumed by one consumer, and unconsumed messages remain in the queue until consumed or timed out. For example, if our producer sends 100 messages, two consumers will consume half of the messages in the order in which they are sent.

② Publish/subscribe (Pub/Sub) model




Publish/subscribe (Pub/Sub) model

Pub/Sub uses Topic as message communication carrier, similar to broadcast mode; A publisher publishes a message that is delivered by topic to all subscribers, and users who subscribe after a message is broadcast do not receive the message.

4.1.3 JMS five different message body formats

JMS defines five different message body formats, as well as the type of message invoked, allowing you to send and receive data in several different forms, providing some level of compatibility with existing message formats.

  • StreamMessage – A data stream of Java raw values
  • MapMessage- a set of name-value pairs
  • TextMessage- a string object
  • ObjectMessage- a serialized Java object
  • BytesMessage- a data stream of one byte

4.2 it

Advanced Message Queuing Protocol (AMQP) is an application-layer standard that provides unified messaging services. AMQP is an open application-layer Protocol designed for message-oriented middleware and compatible with JMS. The client and messaging middleware based on this protocol can pass messages, regardless of the client/middleware and product, different development languages and other conditions.

RabbitMQ is based on the AMQP protocol.

JMS 4.3 vs closer

Compare the direction JMS AMQP
define Java API agreement
cross-language no is
cross-platform no is
Supported message types Two message models are provided: ① peer-2-peer; (2) the Pub/sub Five message models are provided: ① Direct Exchange; (2) the fanout exchange; (3) topic change; (4) headers exchange; (5) the system exchange. In essence, the last four are not much different from JMS’s PUB/SUB model, except for a more detailed division of routing mechanisms;
Supported message types Support for multiple message types, as we mentioned above Byte [] (binary)

Conclusion:

  • AMQP defines the wire-level Protocol for messages, while JMS defines the API specification. In the Java architecture, multiple clients can interact with each other over JMS without the need to modify the code, but the cross-platform support is poor. AMQP is naturally cross-platform and cross-language.
  • JMS supports complex message types such as TextMessage and MapMessage. AMQP supports only byte[] message types (complex types can be serialized and sent).
  • Because of the routing algorithm provided by Exchange, AMQP can provide a variety of routing methods to deliver messages to message queues, whereas JMS only supports queue and topic/subscribe.

Five common message queue comparison

Compare the direction The profile
throughput The throughput of ten-thousand-level ActiveMQ and RabbitMQ (ActiveMQ’s worst performance) is an order of magnitude lower than the hundred-thousand-level or even million-level RocketMQ and Kafka.
availability Can achieve high availability. ActiveMQ and RabbitMQ are both master-slave architectures for high availability. RocketMQ is based on a distributed architecture. Kafka is also distributed, with multiple copies of a single piece of data and a small number of machines down, without data loss and unavailability
timeliness RabbitMQ is based on Erlang, so the concurrency is very strong, performance is extremely good, latency is very low, up to microseconds. The other three are ms.
Function support With the exception of Kafka, the other three features are complete. Kafka is relatively simple and mainly supports simple MQ functions. Real-time computing and log collection are widely used in the field of big data and are the de facto standard
Message loss ActiveMQ and RabbitMQ are very unlikely to be lost, RocketMQ and Kafka are theoretically not lost.

Conclusion:

  • ActiveMQ community is relatively mature, but ActiveMQ’s performance is relatively poor and version iteration is slow, so it is not recommended to use.
  • RabbitMQ is a bit inferior to Kafka and RocketMQ in terms of throughput, but because it is based on Erlang, it has very high concurrency, excellent performance and very low latency of microseconds. But because RabbitMQ is based on Erlang, few companies in China have the resources to do research and customization at the Erlang source level. Of the four message queues, RabbitMQ is your first choice if the business scenario is not too demanding for concurrency (hundreds of thousands, millions). Kafka is the industry standard for real-time computing, log collection and other scenarios in the field of big data. There is no problem with Kafka. The community is very active.
  • RocketMQ is an open source Java project produced by Alibaba. We can read the source code directly and then customize our own MQ, and RocketMQ has the actual test of Alibaba’s business scenarios. The RocketMQ community is relatively unexciting, but it’s ok, the documentation is relatively simple, and the interface part is not following the standard JMS specification. Some systems need to change a lot of code to migrate. And then there’s the technology that Ali came out with, and you have to be prepared for the risk of it being abandoned and the community being wiped out, so if you have the technology I think it’s good to use RocketMQ
  • Kafka’s characteristics are obvious: it offers few core features, but high throughput, ms latency, high availability and reliability, and distributed scalability. At the same time, Kafka should be able to support a small number of topics to ensure its extremely high throughput. The only disadvantage of Kafka is the possibility of repeated consumption of messages, which has a very slight impact on data accuracy. This is negligible in the big data world and log collection. This feature is natural for real-time big data computation and log collection.

Reference: “Java engineer interview assault season 1 – Hua Shi Shan teacher”


Read the original

This article is the original content of the cloud habitat community, shall not be reproduced without permission.