The essence of message queuing in a large distributed system

Outline of this paper:

Message Queue Overview
Application scenarios of message queues
Example messaging middleware (e-commerce, logging)
JMS message service
Common message queue
The resources

Message queue Overview

Message queue middleware is an important component in distributed system. It mainly solves application coupling, asynchronous message, traffic cutting and other problems, and realizes high performance, high availability, scalability and final consistency architecture. It is an indispensable middleware in large distributed system.

At present, the most popular message queues in the production environment are ActiveMQ, RabbitMQ, ZeroMQ, Kafka, MetaMQ, RocketMQ and so on.

Application scenarios of message queues

The following describes four common application scenarios of message queues: asynchronous processing, application decoupling, traffic cutting, and message communication.

Scenario Description: After registration, users need to send registration emails and SMS messages. There are two traditional approaches:

Serial mode: After the registration information is successfully written into the database, a registration email is sent and then a registration SMS is sent. After the preceding three tasks are complete, the system returns them to the client.

Parallel mode: After the registration information is successfully written to the database, a registration email is sent and a registration SMS is sent. After the preceding three tasks are complete, the system returns them to the client. The difference with serial is that the parallel approach can improve the processing time.

Assuming that each of the three business nodes uses 50 ms, leaving aside other overhead such as the network, the serial time is 150 ms and the parallel time might be 100 ms.

Because the CPU can process a certain number of requests per unit of time, assume that CPU throughput is 100 times per second. In serial mode, the number of requests that the CPU can process in 1 second is 7 (1000/150). The number of requests processed in parallel is 10 (1000/100).

Summary: As described in the case above, performance (concurrency, throughput, response time) of the traditional approach can be bottlenecking. How to solve this problem?

The introduction of message queues, which will not be necessary business logic, the architecture of asynchronous processing is transformed as follows:

According to the convention above, the user’s response time is equivalent to the time it takes for the registration information to be written to the database, which is 50 milliseconds. After registering an email, sending a SHORT message and writing it to the message queue, the message queue is directly returned. Therefore, the writing speed of the message queue is very fast and can be ignored. Therefore, the response time of the user may be 50 milliseconds. As a result, the throughput of the system increased to 20 QPS per second after the architecture change. Three times better than serial and two times better than parallel.

Scenario Description: After a user places an order, the order system notifies the inventory system. Traditionally, the order system calls the inventory system interface. The diagram below:

Disadvantages of the traditional model:

If the inventory system is not accessible, the order will fail to reduce inventory, resulting in order failure;
Order system and inventory system coupling;

How to solve the above problems? The scheme after the introduction of application message queue is shown as follows:

Order system: after the user places an order, the order system completes the persistent processing, writes the message to the message queue, and returns the user to place the order successfully.
Inventory system: subscribe to the order message, using the pull/push way to obtain the order information, inventory system according to the order information, inventory operation.
What if: inventory system is not working properly when placing an order. It also does not affect the normal order, because after the order is written to the message queue, the order system does not care about other subsequent operations. Realize the order system and inventory system application decoupling.

Traffic clipping is also a common scenario in message queues and is commonly used in seckilling or group hijacking activities.

Application scenario: The application is suspended due to heavy traffic. To solve this problem, messages are usually queued at the front of the application.

Can control the number of activities;
It can alleviate the crushing application of high flow in a short time;

The server receives the user’s request and writes it to the message queue. If the message queue length exceeds the maximum number, the user request is discarded or the error page is redirected.

The second kill service performs subsequent processing according to the request information in the message queue.

Log processing refers to the use of message queues in log processing, such as Kafka, to solve the problem of large log transfer. The architecture is simplified as follows:

Log collection client, responsible for log data collection, periodic write write Kafka queue;
Kafka message queue, responsible for receiving, storing and forwarding log data;
Log processing applications: subscribe to and consume log data in the Kafka queue;

Here is an example of a Sina Kafka log processing application:

From (http://cloud.51cto.com/art/201507/484338.htm)

Kafka: Message queue that receives user logs.
Logstash: Logs are parsed and output to Elasticsearch as JSON.
Elasticsearch is a schemaless, real-time data storage service that organizes data through index and provides powerful search and statistics capabilities.
Kibana: ELK Stack is a data visualization component based on Elasticsearch.

Message communication means that message queues generally have efficient communication mechanisms built in, so they can also be used for pure message communication. Such as implementing point-to-point message queues, or chat rooms, etc.

Point-to-point communication:

Clients A and B use the same queue to communicate with each other.

Chat room newsletter:

Clients A, B, and N subscribe to the same topic to publish and receive messages. Achieve similar chat room effect.

The above are actually two messaging modes for message queues, point-to-point or publish-subscribe. The model is a schematic diagram for reference.

Examples of messaging middleware

Message queuing uses highly available, persistent message middleware. Such as ActiveMQ, RabbitMQ, RocketMQ.

(1) After the application completes the processing of the trunk logic, it writes the message queue. Whether the message is sent successfully enables the confirmation mode of the message. (The application will return the message after the message queue returns the success status to ensure message integrity.)

(2) Extend the process (SMS, delivery processing) subscription queue message. A push or pull is used to retrieve messages and process them.

(3) While the message will be decoupled, the data consistency problem will be brought, which can be solved in the way of final consistency. For example, the master data is written to the database, the extension application is based on the message queue, and the subsequent processing based on the message queue is realized in combination with the database mode.

It consists of four parts: Zookeeper registry, log collection client, Kafka cluster and Storm cluster (OtherApp).

Zookeeper registry, proposed load balancing and address lookup services;
Log collection client for collecting application system logs and pushing data to Kafka queue;
Kafka cluster: receiving, routing, storing, forwarding and other message processing;
Storm cluster: it is at the same level as OtherApp and consumes data in the queue in pull mode.

JMS message service

Talk about message queues without mentioning JMS. The JMS (JAVA Message Service) API is a messaging Service standard/specification that allows application components to create, send, receive, and read messages based on the JAVA EE platform. It makes distributed communication less coupled, message service more reliable and asynchronous.

In the EJB architecture, message beans integrate seamlessly with JM message services. In the J2EE architectural pattern, there is the message server pattern, which is used to decouple messages directly from applications.

In the JMS standard, there are two message models: P2P (Point to Point) and Publish/Subscribe(Pub/Sub).

P2P model

P2P mode consists of three roles: Queue, Sender, and Receiver. Each message is sent to a specific queue from which the receiver retrieves the message. Queues hold messages until they are consumed or time out.

P2P features:

There is only one Consumer per message (that is, once consumed, the message is no longer in the message queue)
There is no time dependency between sender and receiver, which means that when a sender sends a message, it does not affect whether the receiver is running or not
After receiving a message successfully, the receiver must reply to the queue successfully

P2P is needed if you want every message sent to be processed successfully.

The Pub/sub model

Contains three role topics, Publisher and Subscriber. Multiple publishers send messages to a Topic, and the system delivers these messages to multiple subscribers.

Pub/Sub features:

Each message can have multiple consumers.
There is a temporal dependency between publisher and subscriber. For subscribers to a Topic, it must create a subscriber before it can consume the publisher’s messages.
In order to consume messages, the subscriber must remain running.

To mitigate such strict time dependencies, JMS allows subscribers to create a persistent subscription. This way, the subscriber can receive the publisher’s message even if it is not activated (running).

The Pub/Sub model can be used if you want to send messages that can be processed without any processing, by a single message maker, or by multiple consumers.

In JMS, messages are produced and consumed asynchronously. For consumption, JMS messagers can consume messages in two ways.

(1) Synchronization

The subscriber or receiver receives messages through the receive method, which blocks until the message is received (or timed out);

(2) asynchronous

The subscriber or receiver can register as a message listener. When the message arrives, the listener’s onMessage method is automatically called.

JNDI: Java Naming and Directory Interface, is a standard Java naming system interface. Services can be found and accessed on the network. By specifying a resource name that corresponds to a record in a database or naming service and returns the information necessary to establish a resource connection.

JNDI is used in JMS to look up and access the sending target or message source.

ConnectionFactory

Create a factory for Connection objects for two different JMS message models, QueueConnectionFactory and TopicConnectionFactory. You can look up the ConnectionFactory object through JNDI.

Destination

Destination means the Destination of the message producer or the source of the message consumer. For a message producer, its Destination is a Queue or Topic. To a message consumer, its Destination is also some queue or topic (that is, the message source).

So destinations are really two types of objects: Queue and Topic can look up destinations through JNDI.

Connection

Connection represents the Connection (wrapper around TCP/IP sockets) established between the client and the JMS system. Connection can generate one or more sessions. Like ConnectionFactory, there are two types of Connection: QueueConnection and TopicConnection.

Session

Session is the interface for manipulating messages. Session can be used to create producers, consumers, messages, and so on. Session provides transaction functionality. When you need to use a session to send/receive multiple messages, you can put those send/receive actions into a single transaction. Also, QueueSession and TopicSession are divided.

Message producer

The message producer is created by the Session and used to send the message to the Destination. Again, there are two types of message producers: QueueSender and TopicPublisher. You can call the message producer’s methods (send or publish methods) to send messages.

Message consumer

Message consumers are created by the Session to receive messages sent to destinations. Two types: QueueReceiver and TopicSubscriber. It can be created by session createReceiver(Queue) or createSubscriber(Topic), respectively. Of course, the session’s creatDurableSubscriber method can also be used to create persistent subscribers.

MessageListener

Message listeners. If a message listener is registered, the listener’s onMessage method is automatically called once the message arrives. The MESSage-Driven Bean (MDB) in EJB is a kind of MessageListener.

Learning JMS in depth is a great way to master the JAVA architecture, EJB architecture, and messaging middleware, which are essential components of large distributed systems. Here to do the overall introduction, specific in-depth need to learn, practice, summary, understanding.

Common message queues

Common commercial containers, such as WebLogic and JBoss, support the JMS standard and are easy to develop. However, free ones such as Tomcat, Jetty, etc. require the use of third-party message-oriented middleware. This section introduces the common messaging middleware (Active MQ,Rabbit MQ, Zero MQ,Kafka) and their features.

ActiveMQ is the most popular and powerful open source message bus produced by Apache. ActiveMQ is a JMS Provider implementation that fully supports JMS1.1 and J2EE 1.4 specifications. Although JMS specifications have been issued for a long time, JMS still plays a special role in today’s J2EE applications.

ActiveMQ features are as follows:

Write clients in multiple languages and protocols. Languages: Java,C,C++,C#,Ruby,Perl,Python,PHP. Application protocols: OpenWire,Stomp REST,WS Notification,XMPP,AMQP
Full support for JMS1.1 and J2EE 1.4 specifications (persistence, XA messaging, transactions)
With Spring support, ActiveMQ can be easily embedded into systems that use Spring and also supports Spring2.0 features
Having passed testing on common J2EE servers (such as Geronimo,JBoss 4,GlassFish,WebLogic), with JCA 1.5 Resource Adaptors configured, ActiveMQ can be automatically deployed to any J2EE 1.4 compliant commercial server
Support for multiple transfer protocol: the in – VM, TCP, SSL, NIO, UDP, JGroups, JXTA
Support for high-speed message persistence through JDBC and Journal
Designed to ensure high performance clusters, client-server, point-to-point
Support for Ajax
Support integration with Axis
You can easily invoke the embedded JMS provider to test

RabbitMQ is a popular open source message queue system developed in the Erlang language. RabbitMQ is a standard implementation of AMQP (Advanced Message Queuing Protocol). Supports multiple clients, such as Python, Ruby,.NET, Java, JMS, C, PHP, ActionScript, XMPP, STOMP, etc., supports AJAX and persistence. It is used to store and forward messages in distributed systems, and has good performance in ease of use, scalability, and high availability.

The structure diagram is as follows:

A few key concepts:

Broker: Simple message queue server entity.
Exchange: message switch, which specifies the rules by which messages are routed to which queue.
Queue: Message Queue carrier, each message is put to one or more queues.
Binding exchanges and queues according to routing rules.
Routing Key: The Key used by Exchange to deliver messages.
Vhost: Virtual host. Multiple vhosts can be set up within a broker to separate permissions between different users.
Producer: A program that delivers messages.
A consumer is a program that receives messages.
Channel: Message channel. In each connection of the client, multiple channels can be established. Each channel represents a session task.

The process of using message queues is as follows:

(1) The client connects to the message queue server and opens a channel.

(2) The client declares an Exchange and sets related properties.

(3) The client declares a queue and sets related properties.

(4) The client uses the routing key to establish a binding relationship between exchange and queue.

(5) The client delivers messages to Exchange.

When exchange receives a message, it routes the message to one or more queues based on the key and the binding that has been set.

Called the fastest message queue in history, it is actually a series of interfaces similar to sockets. The difference between sockets is that normal sockets are end-to-end (1:1 relationship), while ZMQ can N: What is known about BSD sockets is point-to-point connections. Point-to-point connections require explicit connection establishment, connection destruction, protocol selection (TCP/UDP), error handling, etc. ZMQ shields these details to make your network programming easier. ZMQ is used to communicate between nodes, which can be hosts or processes.

ZMQ(ZeroMQ for short ZMQ) is a simple transport layer, like a socket library framework, which makes socket programming simpler, simpler, and higher performance. Is a message processing queue library that scales flexibly between multiple threads, the kernel, and the host box. The stated goal of ZMQ is “to become part of the standard network protocol stack and then into the Linux kernel.” We haven’t seen their success yet. However, it is certainly a promising, and much more desirable, layer of encapsulation on top of “traditional” BSD sockets. ZMQ makes writing high-performance web applications extremely easy and fun.”

Features are:

High performance, non-persistent;
Cross-platform: Supports Linux, Windows, and OS X.
Multilanguage support; C, C++, Java,.NET, Python and more than 30 other development languages.
Can be deployed independently or integrated into the application for use;
Can be used as a Socket communication library.

Compared to RabbitMQ, ZMQ is not a traditional message queue server. In fact, it is not a server at all. It is more of an underlying network communication library, wrapped around the Socket API, abstracting network communication, process communication and thread communication into a unified API. Support “request-reply”, “publisher-subscriber”, “Parallel Pipeline” three basic model and extension model.

Key points of ZeroMQ high-performance design:

(1) Queue model without lock

For pipe, the data exchange channel between the cross-thread interaction (client and session), a lock-free queue algorithm CAS is used. Asynchronous events are registered on both ends of the PIPE. Read and write events are automatically triggered when messages are read or written to the pipe.

(2) batch processing algorithm

For traditional message processing, each message in the sending and receiving time, need to call the system, so for a large number of messages, the system overhead is relatively large, zeroMQ for batch messages, adaptive optimization, can receive and send messages in batch.

(3) Multi-core thread binding without CPU switch

Unlike traditional multithreaded concurrent modes, semaphores, or critical sections, zeroMQ takes full advantage of multiple cores, running one worker thread per core binding, avoiding CPU switching overhead between multiple threads.

Kafka is a high-throughput distributed publish-subscribe messaging system that processes all action flow data in consumer-scale websites. This action (web browsing, searching and other user actions) is a key factor in many social functions on the modern web. This data is usually addressed by processing logs and log aggregation due to throughput requirements. This is a viable solution for logging data and offline analysis systems like Hadoop, but with limitations that require real-time processing. Kafka is designed to unify online and offline message processing through Hadoop’s parallel loading mechanism, and to provide real-time consumption through clustered machines.

Kafka is a high-throughput distributed publish-subscribe messaging system with the following features:

Message persistence is provided through the O(1) disk data structure, which can maintain stable performance over long periods of time even with terabytes of message storage. (Data is written in the way of file appending, and expired data is deleted periodically)
High throughput: Even very modest hardware Kafka can support millions of messages per second.
Support for partitioning messages through Kafka servers and consumer machine clusters.
Hadoop supports parallel data loading.

Kafka

Broker

A Kafka cluster consists of one or more servers, called brokers [5].

Topic

Each message published to the Kafka cluster has a category called Topic. Physically, messages from different topics are stored separately. Logically, messages from one Topic are stored on one or more brokers, but users can produce or consume data by specifying the Topic of the message, regardless of where the data is stored.

Partition

Parition is a physical concept. Each Topic contains one or more partitions,.

Producer

Responsible for publishing messages to Kafka Broker.

Consumer

Message consumers, clients that read messages to Kafka Broker.

Consumer Group

Each Consumer belongs to a specific Consumer Group (you can specify a Group name for each Consumer, or the default Group if you do not specify a Group name).

It is generally used in big data log processing or scenarios that have low requirements on real-time performance (a little delay) and reliability (a little data loss).

Vi. Reference materials

The following are the materials shared and recommended for reference.

JMS:

http://blog.sina.com.cn/s/blog_3fba24680100r777.html
http://blog.csdn.net/jiuqiyuliang/article/details/46701559

The RabbitMQ:

http://baike.baidu.com/link?url=s2cU-QgOsXan7j0AM5qxxlmruz6WEeBQXX-Bbk0O3F5jt9Qts2uYQARxQxl7CBT2SO2NF2VkzX_XZLqU-CTaPa
http://blog.csdn.net/sun305355024sun/article/details/41913105

ZeroMQ:

http://www.searchtb.com/2012/08/zeromq-primer.html
http://blog.csdn.net/yangbutao/article/details/8498790
http://wenku.baidu.com/link?url=yYoiZ_pYPCuUxEsGQvMMleY08bcptZvwF3IMHo2W1i-ti66YXXPpLLJBGXboddwgGBnOehHiUdslFhtz7RGZYkrt MQQ02DV5sv9JFF4LZnK

Kafka:

http://baike.baidu.com/link?url=qQXyqvPQ1MVrw9WkOGSGEfSX1NHy4unsgc4ezzJwU94SrPuVnrKf2tbm4SllVaN3ArGGxV_N5hw8JTT2-lw4QK
http://www.infoq.com/cn/articles/apache-kafka/
http://www.mincoder.com/article/3942.shtml

The essence of message queuing in a large distributed system

Related Posts

Remember a Dokcer environment setup

“Big Talk Architecture” Ali architects share technical essentials Java programmers need to break through

Summary of Spring Boot+Mybatis project