The transaction

When it comes to the word “transaction”, database transaction usually comes to mind first, which is more common in daily development.

Database transactions come to mind with four properties (ACID) : atomicity, consistency, isolation, and persistence. All of its features are ultimately designed to ensure consistency — data consistency: the state of the data is correct before and after a transaction, and by “correct” I mean as expected. For example, if A wants to transfer B100 yuan, the final result should be that A loses 100 yuan and A’s balance is greater than or equal to 0, B should increase 100 yuan or A fails to transfer, neither decrease nor increase. Other situations should not occur, such as A’s money decreases, B does not increase or A’s balance is less than 0, etc.

MQ transactions are also designed to ensure ultimate consistency of data.

Distributed transactions for MQ

A distributed transaction is a transaction implemented in a distributed system. The reason why a transaction in a message is called a distributed transaction is that the participants and transaction managers of the transaction — producer, broker and consumer — are located on different nodes of the distributed system.

P.S. Of course, your sending and consuming ends may also be in the same application, on the same node.

MQ transaction messages are designed to address the consistency between the sender, the sender’s local transaction, and the consumer. In RoketMQ, the consumer does not need to be considered because it will be consumed at least once. Therefore, it mainly addresses the consistency between the sending of messages and its local transaction logic.

Transaction messages for RocketMQ

RocketMQ’s transaction messages are a solution to distributed transactions. At present, there is no particularly perfect solution for distributed transactions in the industry, such as a solution that can ensure strong consistency of data under the premise of reliability, availability and high performance.

RocketMQ’s solution for transactional messaging is to ensure the ultimate consistency of data in scenarios where message realtime is not required.

RocketMQ’s transaction messages use 2PC: two-phase Commit and are guaranteed by the introduction of transaction backtracking (also known as backtracking or backtracking).

2PC

For the two-phase commit algorithm, here’s a direct quote from Wikipedia:

But this direct understanding of RocketMQ’s messages can be a bit opaque. RocketMQ is implemented as follows:

  1. In the first phase, pre-commit: Producer sends a Half message (a Half message that is not visible to the consumer, but whose body content is the actual message content to be sent by the business side).

  2. Perform local transaction logic

  3. Phase 2: If step 2 succeeds, the half-message sent in step 1 is committed, and the message is visible to the consumer. If step 2 fails, the message sent in step 1 is rolled back (it is not a true undo message body, just ensure that the message is never visible to the consumer).

Transaction to check

Transaction lookback is a mechanism introduced to address phase 2 message delivery failures. When submitting or rolling back messages in Phase 2, it is inevitable that the submission/rollback messages will fail to be sent due to network reasons, or the current Producer instance will fail to be executed in Phase 2 due to downtime or restart when the local transaction is completed. If the commit/rollback message from producer fails to be sent to the server broker, the broker does not know whether the half-message sent in the first phase should commit to a state visible to the consumer or a rollback state that is not visible to the consumer.

Transaction lookbacks are designed to address this scenario of phase 2 message failure. When a producer sends a phase 1 half-message and fails to commit the phase 2 message in a timely and successful manner, the broker will ask an instance of the producer every minute whether the transaction is successful, commit or rollback. This is a maximum of 15 times by default. If you do not get the correct transaction status after 15 attempts, the message is rolled back directly.

So there are two key points to note here:

When the local transaction is executed, the transaction execution status should be saved in the external shared media. For example, whether the status of the transaction ID of the current message is successfully or failed is saved in the database. When the transaction is checked back, the execution result of the local transaction can be queried according to the transaction ID to determine whether the message should be submitted or rolled back. Each producer in the same production group should have a value greater than 1. Each producer in the same production group represents an instance of the same class. When a transaction looks back, the broker will select the next instance of the production group for inquiry. If the producer group has only one producer instance and the second-stage message fails due to network problems and the network recovers, the broker will still communicate with the producer when it is checked back. However, if this node is down and there are no other transactions under the producer group, Transaction rollback cannot obtain the correct transaction status all the time, and the message must be rolled back when the maximum number of times is reached. To ensure high availability, it is recommended that the number of nodes be greater than 1.

After the introduction of transaction lookup, the transaction message execution flow is as follows:

Application Scenario Requirements

  • Message real-time requirement is not high
  • Final consistency is acceptable