A brief analysis of the Producer principle of RocketMq

How do producers send messages?

MessageQueue

First, before we can understand producer sending messages, we must understand one concept: what is a MessageQueue? MessageQueue is a data sharding + physical storage mechanism for RocketMq.

We typically specify the number of MessageQueue when creating a Topic.

As shown in the figure above, there are four MessageQueue in a Topic, and two MessageQueue on each Brokers. Producers write messages to different MessageQueue through an algorithm (uniform allocation by default). MessageQueue data can be persisted on disk.

This spreads messages across multiple brokers, making them more resistant to concurrency!

Producer connection NameSever

Producer uses NameSever to obtain the routing information of the specific Topic’s brokers and stores a cache of data locally, such as which MessageQueue a Topic has and which brokers the MessageQueue is on. Broker IP. Port, etc. The Producer sends messages only to the Master Broker, and the Slave obtains data through Master/Slave synchronization.

So how does Produce link to NameSever?

The connection

A single producer keeps a long connection with a Nameserver, periodically queries the topic configuration information, and if the Nameserver fails, the producer will automatically connect to the next Nameserver until there is an available connection, which can be automatically reconnected.
Polling time

By default, producers get the latest queue status for all topics from Nameserver every 30 seconds, which means that if a broker goes down, it will take up to 30 seconds for producers to sense it, during which time messages to that broker fail to be sent. This time is determined by the pollNameServerInteval parameter of DefaultMQProducer and can be manually configured.
The heartbeat

No heartbeat with Nameserver

Producer connection Broker

The connection

Producers maintain long connections to all Broker Master nodes involved in a Topic.
The heartbeat

By default, producers send heartbeats to all Broker Master nodes every 30 seconds. The broker scans all surviving connections every 10 seconds. If a connection has not received heartbeat data within 2 minutes (the difference between the current time and the last update time is more than 2 minutes, which cannot be changed), it closes the connection.

How does Producer send messages

Fault-tolerant mechanism

As the party sending messages, Producer has three fault-tolerant mechanisms:

The local cache

Cache the information obtained from NameSever locally in case NameSever goes down
The Broker collection is not available

Producer has a fault tolerance mechanism for brokers. SendLatencyFaultEnable can be turned on. RocketMq maintains an internal HashMap of failed brokers and puts brokers of a certain latency level into this map. The next time you select a Broker, you will avoid unavailable brokers.
retry

When the Producer sends messages, there is a retry mechanism, with three retries by default
Dead letter queue Consumer The number of Consumer retries exceeds the specified number

Load balancing

The producer realizes the load balancing of the sender by polling all MessageQueue under a Topic, as shown in the figure below:

In this way, messages from a Topic can be spread across multiple MessageQueue and, in turn, across multiple brokers.

strategy

Random increasing modulus

thinking

What about the NameSever outage

If the NameSever connected to Producer suddenly goes down, it can take up to 30 seconds for Producer to sense it. In this case, Producer can first read the routing information of the Topic from the local cache. Flush the local cache until you connect to the next NameSever.

What if the Broker breaks down

If the Broker connected to the Producer suddenly breaks down, for example, the Master Broker is suspended. Other Slave brokers elect a Master Broker, but all messages sent by the Producer fail in the process.

For this problem, there is a switch, sendLatencyFaultEnable, in Producer. This switch has a fault-tolerant mechanism. For example, access to a Broker will be avoided for a period of time if there is a 500ms delay in accessing it. For example, the Broker cannot be accessed within 3000ms to prevent messages from being sent to the failed Broker.

In addition, the Producer itself can catch sending exceptions and retry.