Message queues are an important component of distributed systems and are used in many production environments where concurrency is controlled, such as commodity buying.
Overview of Message Queuing (MQ)
Message Queue is an important component in distributed system. Its common usage scenarios can be simply described as follows:
When you don’t need immediate results, but the amount of concurrency needs to be controlled, that’s pretty much the time to use message queues.
Message queue mainly solves the problems of application coupling, asynchronous processing and traffic cutting.
The most popular message queues are RabbitMQ, RocketMQ, ActiveMQ, Kafka, ZeroMQ, MetaMq, etc. Some databases such as Redis, Mysql and PHXSQL can also realize the function of message queues.
Message queue usage scenarios
The actual application of message queue includes the following four scenarios:
- Application coupling: Multiple applications process the same message through message queues to avoid the failure of the whole process caused by interface invocation failure.
- Asynchronous processing: Multiple applications process the same message in the message queue and process messages concurrently between applications, which reduces the processing time compared with serial processing.
- Peak clipping: widely used in second kill or buying activities, to avoid the application system hanging due to excessive flow;
- Message driven system: the system is divided into message queue, message producer and message consumer. The producer is responsible for message generation, and the consumer (possibly multiple) is responsible for message processing.
The following details the four scenarios and how message queues can be used in each of them:
2.1 Asynchronous Processing
Scenario: To register an application, the system sends a registration email and an authentication SMS message. There are two ways to handle these two operations: serial and parallel.
(1) Serial mode: after the new registration information is generated, the registration email is sent first, and then the verification SMS is sent;
In this mode, the authentication SMS needs to be sent and then sent back to the client.
(2) Parallel processing: after the new registration information is written, it is processed by sending SMS and email in parallel;
In this mode, sending SMS messages and emails is returned to the client after processing.
Assuming that the processing time of the above three subsystems is 50ms and the network delay is not taken into account, the total processing time is:
Serial: 50+50+50=150ms Parallel: 50+50= 100ms
To use message queues:
Then the total response time depends on the time of writing to the message queue, and the time of writing to the message queue itself can be very fast, basically can be ignored, so the total processing time is increased by 2 times compared with serial, and doubled compared with parallel.
2.2 Application Coupling
Specific scenario: users use QQ album to upload a picture, face recognition system will carry out face recognition of the picture, the general practice is that the server received the picture, the picture uploading system immediately call face recognition system, call completion and then return success, as shown below:
This method has the following disadvantages:
- Face recognition system is adjusted failure, resulting in picture uploading failure;
- High delay, face recognition system processing is completed, and then returned to the client, even if the user does not need to know the results immediately;
- Image uploading system and face recognition system call each other, need to do coupling;
To use message queues:
After the client uploads an image, the image uploading system writes the image information, such as UIN and batch, to the message queue, and returns success. The face recognition system is timed from the message queue to take data to complete the new image recognition.
At this point, the picture uploading system does not need to care about whether the face recognition system processes these picture information, and when to process these picture information. In fact, because the user does not need to know the face recognition results immediately, the face recognition system can choose different scheduling strategies, according to the idle time, busy time, normal time, to process the picture information in the queue.
2.3 Current limiting peak cutting
Specific scenario: When shopping websites carry out seckilling activities, due to the instantaneous traffic is too large and the server receives too much traffic, the traffic will surge, and the relevant system cannot handle the request or even crash. After joining the message queue, the system can fetch data from the message queue, which is equivalent to a buffer in the message queue.
This method has the following advantages:
- The incoming message queue, instead of being directly processed by the business processing system, makes a buffer, which greatly reduces the pressure of the business processing system.
- The queue length can be limited. In fact, the last user to queue cannot kill the item in seconds. These requests can be discarded directly, returning the information that the activity is over or the item is sold out.
2.4 Message-driven systems
Specific scenario: the user uploaded a batch of photos, face recognition system needs to cluster all the photos of the user, after clustering is completed by the reconciliation system to regenerate the user’s face index (speed up the query). The three subsystems are connected by message queues. The processing results of the former stage are put into queues, and the latter stage gets messages from the queues to continue processing.
This method has the following advantages:
- Avoid calling the next system directly and causing the current system to fail;
- Each subsystem can be more flexible in the way of message processing. It can choose to process the message when it is received, choose to process the message regularly, or divide the time period to process the message according to different processing speed.
Two modes of message queuing
Message queues consist of two modes, point to Point (Queue) and publish/subscribe (topic).
3.1 Point-to-point mode
There are three roles in point-to-point mode:
- The message queue
- Sender (producer)
- Recipient (consumer)
The message sender produces the message to be sent to the queue, and the message receiver retrieves and consumes the message from the queue. After a message is consumed, there is no storage in the queue, so it is impossible for the message receiver to consume a message that has already been consumed.
Point-to-point mode features:
- Each message has only one receiver (Consumer) (that is, once consumed, the message is no longer in the message queue);
- There is no dependency between sender and receiver. After the sender sends a message, no matter whether the receiver is running, it will not affect the sender to send the message next time.
- After receiving the message successfully, the receiver needs to reply to the queue successfully, so that the message queue deletes the currently received message.
3.2 Publish/subscribe
There are three roles in publish/subscribe mode:
- Role Topic
- The Publisher (Publisher)
- The Subscriber (the Subscriber)
Publishers send messages to topics, and the system delivers these messages to multiple subscribers.
Publish/subscribe features:
- Each message can have multiple subscribers;
- There is a temporal dependency between publisher and subscriber. For subscribers to a Topic, it must create a subscriber before it can consume the publisher’s messages.
- In order to consume messages, subscribers need to subscribe to the role topic in advance and stay online;
4. Introduction to common message queues
This part mainly introduces four kinds of commonly used message queue (the RabbitMQ/ActiveMQ/RocketMQ/Kafka) the main features, advantages and disadvantages.
4.1 the RabbitMQ
RabbitMQ, released in 2007, is a reusable enterprise messaging system based on AMQP(Advanced Message Queuing Protocol) and is one of the most popular messaging middleware.
Main features:
- Reliability: A variety of techniques are available to allow you to make trade-offs between performance and reliability. These technologies include persistence mechanisms, post validation, publisher validation, and high availability mechanisms;
- Flexible routing: Messages are routed through the switch before reaching the queue. RabbitMQ provides a variety of built-in switch types for typical routing logic. If you have more complex routing requirements you can combine these switches and even implement your own switch type and use it as a RabbitMQ plug-in.
- Message clustering: Multiple RabbitMQ servers on the same LAN can be aggregated and used as a single logical agent.
- Queue high availability: Queues can be mirrored on machines in the cluster to ensure message security in the event of hardware problems;
- Support for multiple protocols: Support for multiple message queue protocols;
- The server side is written in Erlang, supporting just about any programming language you can think of.
- Administration Interface: RabbitMQ has an easy-to-use user interface that allows users to monitor and manage many aspects of a message Broker;
- Trace mechanism: If a message is abnormal, RabbitMQ provides a trace mechanism so that the user can find out what happened.
- Plug-in mechanism: many plug-ins are provided to extend from many aspects, or you can write your own plug-in;
To use RabbitMQ:
- ErLang language pack
- The RabbitMQ installation package
RabbitMQ can run on platforms supported by the Erlang language:
Solaris
BSD
Linux
MacOSX
TRU64
Windows NT/2000/XP/Vista/Windows 7/Windows 8
Windows Server 2003/2008/2012
Windows 95, 98
VxWorks
Advantages:
- Due to the nature of the Erlang language, MQ performance is high and concurrency is high;
- Robust, stable, easy to use, cross-platform, multi-language support, complete documentation;
- With message confirmation mechanism and persistence mechanism, high reliability;
- Highly customizable routing;
- The management interface is rich, and it is also widely used in Internet companies.
- High community activity;
Disadvantages:
- Although combined with the concurrency advantages of Erlang language itself, the performance is good, but it is not conducive to secondary development and maintenance;
- The broker architecture is implemented, meaning messages can be queued on the central node before being sent to the client. This feature makes RabbitMQ easy to use and deploy, but slow to run because of the added latency of the central node and the large size of message encapsulation;
- Complex interfaces and protocols need to be learned, which is costly to learn and maintain.
4.2 ActiveMQ
ActiveMQ is produced by Apache. ActiveMQ is a JMS Provider implementation that fully supports JMS1.1 and J2EE 1.4 specifications. It is fast, supports clients and protocols in multiple languages, and can be easily embedded into an enterprise application environment with many advanced features.
Main features:
- Compliance with the JMS specification: The JMS specification provides good standards and guarantees, including: synchronous or asynchronous message distribution, one-off and one-off message distribution, message reception and subscription, and so on. The advantage of following the JMS specification is that these basic features are available regardless of which JMS implementation provider is used;
- Connectivity: ActiveMQ provides a wide range of connection options, supporting protocols such as HTTP/S, IP multicast, SSL, STOMP, TCP, UDP, XMPP, and more. The support for many protocols gives ActiveMQ great flexibility.
- Supported protocols include OpenWire, STOMP, REST, XMPP, and AMQP.
- Persistence plug-ins and security plug-ins: ActiveMQ offers a variety of persistence options. In addition, ActiveMQ security can also be customized authentication and authorization according to user needs;
- Supported client languages: in addition to Java, there are C/C++,.NET, Perl, PHP, Python, Ruby;
- Proxy cluster: Multiple ActiveMQ proxies can form a cluster to provide services;
- Surprisingly simple management: ActiveMQ was designed with a developer in mind. As a result, it does not require a dedicated administrator because it provides simple and usable administrative features. There are many ways to monitor ActiveMQ data at different levels, including using JMX in JConsole or ActiveMQ’s Web Console, by handling JMX alarm messages, by using command-line scripts, and even by monitoring various types of logs.
Using ActiveMQ requires:
- Java JDK
- ActiveMQ installation package
ActiveMQ can run on platforms supported by the Java language.
Advantages:
- Cross-platform (JAVA written platform independent, ActiveMQ can run on almost any JVM)
- You can use JDBC: You can persist data to a database. Although using JDBC degrades ActiveMQ’s performance, databases have always been the storage medium most familiar to developers. Store the information in a database so it can be seen and touched. And the company has a special DBA to tune the database, the separation of master and slave;
- JMS support: Supports the JMS unified interface;
- Support automatic reconnection;
- Security mechanism: Supports various security configuration mechanisms, such as Shiro and JAAS, to authenticate and authorize Queue/Topic.
- Complete monitoring: Complete monitoring, including Web Console, JMX, Shell command line, Jolokia REST API;
- User-friendly: The Web Console is available for most cases, and there are many third-party components available, such as Hawtio;
Disadvantages:
- Community activity is not as high as RabbitMQ;
- According to the feedback of other users, there will be inexplicable problems and messages will be lost;
- At present, the focus is on ActivemQ6.0 product – Apollo, and there is less maintenance for 5.x.
- Not suitable for thousands of queues;
4.3 RocketMQ
RocketMQ is an open source product from Alibaba. Implemented in the Java language, RocketMQ has been designed with Kafka in mind and has made some improvements of its own that are better than Kafka in terms of message reliability. RocketMQ is widely used in Alibaba Group in order, transaction, recharge, stream computing, message push, log stream processing, binglog distribution and other scenarios.
Main features:
- It is a message middleware of queue model, which has the characteristics of high performance, high reliability, high real-time and distributed.
- Producers, consumers, and queues can all be distributed.
- A Producer alternately sends messages to a set of queues called topics. If a Consumer consumes broadcast consumption, a single Consumer instance consumes all queues corresponding to the Topic. If a Consumer consumes cluster consumption, multiple Consumer instances consume the set of queues corresponding to the Topic on average.
- Can ensure strict message order;
- Provide rich message pull patterns;
- Efficient subscriber level expansion capabilities;
- Real-time message subscription mechanism;
- Hundred-million-level message accumulation ability;
- Less dependence;
Using RocketMQ requires:
- Java JDK
- Install Git and Maven
- RocketMQ installation package
RocketMQ runs on platforms supported by the Java language.
Advantages:
- A single machine supports more than 10,000 persistent queues
- All RocketMQ messages are persistent, written to the system PAGECACHE and then flushed to ensure that both memory and disk have a copy of the data.
When accessed, it reads directly from memory.
- The model is simple and the interface is easy to use (JMS interface is not very practical in many cases).
- Performance is very good, can heap messages in the broker;
- Supports multiple consumption, including cluster consumption and broadcast consumption.
- Distributed and extended design for each link, master-slave HA;
- The development degree is relatively active, the version updates quickly.
Disadvantages:
There are not many client languages supported, currently Java and c++, among which c++ is not mature;
The RocketMQ community is also less focused and mature than the first two;
There is no Web management interface, but a COMMAND-LINE interface (CLI) management tool to query, manage, and diagnose problems.
Interfaces such as JMS are not implemented in the MQ core;
4.4 Kafka
Apache Kafka is a distributed message publishing and subscription system. Originally implemented by LinkedIn as a distributed Commit Log with a unique design, it became part of the Apache project. Kafka systems are fast, extensible, and persistent. Its partitioning features, copy-ability and fault-tolerance are all nice features.
Main features:
- Fast persistence, which can persist messages with O(1) overhead;
- High throughput, in a common server can reach 10W/s throughput rate;
- Fully distributed system, brokers, producers and consumers all automatically support distribution and automatically realize load balancing;
- Supports synchronous and asynchronous replication HA.
- Support data batch sending and pulling;
- Zero-copy: reduces I/O operations.
- Data migration and capacity expansion are transparent to users.
- The machine can be expanded without stopping;
- Other features: strict message ordering, rich message pull model, efficient horizontal subscriber expansion, real-time message subscription, multi-billion message stacking capability, periodic deletion mechanism;
Using Kafka requires:
- Java JDK
- Kafka installation package
Advantages:
- Rich client language, support Java,.NET, PHP, Ruby, Python, go and other languages;
- Excellent performance, single-machine write TPS of about 1,000,000 / SEC, message size of 10 bytes;
- It provides a completely distributed architecture and replica mechanism, which has high availability and reliability and theoretically supports unlimited accumulation of messages.
- Batch operations are supported.
- Consumers use Pull mode to obtain messages, messages are orderly, through control can ensure that all messages are consumed and only once;
- There are excellent third-party Kafka Web management interface Kafka-Manager;
- It is mature in the logging field and is used by multiple companies and open source projects;
Disadvantages:
- When a Kafka single machine has more than 64 queues/partitions, the Load increases significantly. The more queues, the higher the Load, and the longer the response time of sending messages
- In short polling mode, real-time performance depends on polling interval time.
- Retry is not supported when consumption fails.
- Message ordering is supported, but when an agent goes down, messages are out of order.
- Community updates are slow;
4.5 the RabbitMQ/ActiveMQ/RocketMQ/Kafka contrast
Here is a comparison of the four message queues:
Conclusion:
Kafka is a distributed architecture. RabbitMQ is based on the AMQP protocol. RocketMQ/ is based on Kafka and has been changed to a master-slave structure, which is optimized for transactional reliability. More broadly, consider RabbitMQ and RocketMQ for transactional applications such as e-commerce and finance, and Kafka for performance applications.
Five, reference materials:
5.1 Message Queue:
- Large site architecture of distributed message queue blog.csdn.net/shaobingj12…
- The use of the message queue scene www.zhihu.com/question/34…
- Introduction to the asynchronous message queue model www.cnblogs.com/sunkeydev/p…
- Two kinds of message queue model blog.csdn.net/heyutao007/…
5.2 the RabbitMQ
- The RabbitMQ homepage at www.rabbitmq.com/
- The RabbitMQ learning tutorial www.rabbitmq.com/getstarted….
- Column: the RabbitMQ from entry to master blog.csdn.net/column/deta…
- What RabbitMQ can do for you rabbitmq.mr-ping.com/description…
- The RabbitMQ guide (1) – characteristics and function of blog. Zenfery. Cc/archives / 79…
5.3 ActiveMQ
- ActiveMQ homepage activemq.apache.org/
- Introduce the Apache ActiveMQ jfires.iteye.com/blog/118768…
- Introduction and installation of ActiveMQ blog.csdn.net/sl1992/arti…
- Introduction of ActiveMQ and message www.cnblogs.com/craftsman-g…
5.4 RocketMQ
- Home page github.com/alibaba/Roc…
- RocketMQ principle introduction to alibaba. Making. IO/RocketMQ – do…
- RocketMQ compared with kafka difference (18) jm.taobao.org/2016/03/24/…
5.5 Kafka
Kafka homepage: kafka.apache.org/
- Kafka features www.cnblogs.com/lsx1993/p/4…
- Kafka cwiki.apache.org/confluence/ client support language…
5.6 the RabbitMQ/ActiveMQ/RocketMQ/Kafka contrast
- RocketMQ, queue selection www.zmannotes.com/index.php/2…
- The RabbitMQ and Kafka www.dongcoder.com/detail-4168…
- Im the RabbitMQ 2 – performance testing www.jianshu.com/p/d31ae9e3b…
- The comparison between the RabbitMq, ActiveMq, ZeroMq, kafka, data summary blog.csdn.net/linsongbin1…
- Message queue software product lists www.cnblogs.com/amityat/arc…
Conclusion:
Message queue uses efficient and reliable messaging mechanism to communicate platform-independent data and integrate distributed systems based on data communication. At present, there are many MQ products in the industry, such as RabbitMQ, RocketMQ, ActiveMQ, Kafka, ZeroMQ, MetaMq, etc. There are also cases that directly use redis database as message queue. These message queue products, each has its own focus, in the actual selection, need to combine their own needs and MQ product characteristics, comprehensive consideration.