1 introduction

In the microservice architecture, the subdivision of services leads to the splitting of modules in a single service into multiple services, and the data storage changes from a single data source to multiple data sources according to services. For single service and single data source, ACID mechanism (Atomicity Atomicity, Consistency, Isolation Isolation, persistence) of database ensures Consistency of business data (usually referred to as local transactions). In the microservice architecture, the BASE mechanism (Basically Available, Soft State, and Eventual Consistency) is usually adopted to ensure the final data Consistency between systems.

Distributed transactions are typically implemented using 2PC, 3PC, TCC, SAGA, maximum effort notification, and the ultimate consistency implementation based on message delivery and consumption described in this article. 2PC and 3PC rely on the transaction capability of the database. If the transaction is not initiated at the first stage, the concurrency performance of the application will be seriously affected, and it will not be adopted in actual services. Compared with 2PC and 3PC, TCC is a flexible transaction with higher concurrency performance, but it requires more intrusive and complex implementation. It requires the service side to implement Try, Cancel, and Confirm methods. At the same time, in order to solve the abnormal situation caused by network communication failure or timeout, It requires the business side to allow empty rollback, operation idempotency and prevent resource suspension in the design and implementation, which is more suitable for business scenarios requiring high data consistency, such as combination payment and order reduction inventory. Compared with TCC, SAGA supports higher concurrency and lower business intrusion, which is suitable for business scenarios with long transactions.

Based on the final consistency of message delivery and consumption

Based on message delivery and consumption of eventual consistency Such implementation scheme can be subdivided into local messages table, reliable service, transaction message (a special case of reliable messaging service), as a result of these solutions are essentially turns the business operations of across systems reliable message delivery and consumption, to achieve the purpose of the distributed transaction is split into multiple local affairs. To achieve the final consistency of data between systems, these schemes are collectively called the final consistency based on message delivery and consumption.

2.1 Local Message table

The local message table scheme was proposed by Dan Pritchett, a system architect at eBay, in his 2008 ACM paper Base: An Acid Alternative, in which he proposed the Base mechanism. Let’s see if local transactions can be used directly to ensure the final consistency of data between systems.

In common e-commerce platforms, users purchase goods and create orders for example. When creating orders, the order system usually adopts the method of withholding inventory to avoid overbooking. After the user pays successfully, the inventory is actually deducted. In order to improve the efficiency of order creation, order generation and withholding inventory in the order system are asynchronously decoupled, that is, orders are generated in the order system and then messages of withholding inventory are delivered to MQ.

The order system is divided into four steps, respectively

  • User ID locking prevents users from placing repeat orders

    The automatic release time of the lock is usually specified. For example, under the normal condition of 1 second, the user can not really place multiple orders within 1 second. This situation is more caused by repeated orders.

  • Lua script check inventory and withhold inventory

    This step is mainly to check and withhold inventory in the cache

  • Create the order

    Actually create the business order

  • Post order creation message

    The order system posts an order creation message to MQ

When discussing distributed transactions, we directly skip the two steps of locking the user ID to prevent users from placing repeat orders and Lua script checking inventory and withholding inventory, and only care about the create order and post order creation messages related to the transaction.

There are two common mistakes:

  • An order is created in a local transaction before an order creation message is posted 
Create an order. 2. Send an order. Create a messageCopy the code

The problems of this approach are as follows: The second step post order creation message timed out due to network jitter, and the whole local transaction rolled back. At this time, the business order creation is rolled back, but the order creation message may have been posted to MQ, resulting in the goods inventory is wrongly pre-held.

  • An order creation message is posted and an order is created in a local transaction 
1. Send the order creation message. 2Copy the code

This practice also exists, the post order creation message timed out due to network jitter, and the entire local transaction is rolled back. At this point, the business order has not been created, but the order creation message may have been posted to MQ, resulting in the inventory of goods being incorrectly preloaded.

So how do you guarantee that the order creation and post order creation messages will either succeed or fail together? Local messaging solves this problem as follows:

1. The upstream system adds or updates service records in the local transaction and adds local message records with the status of waiting to be sent. 2. 4. The downstream system consumes the message pushed by MQ, executes the business logic, and returns an ACK to MQCopy the code

Note: Step 2 possible timing task to MQ delivering duplicate messages, step 3 possible repeated push MQ message, so the downstream when doing business process must be through a certain mechanism to ensure that the operating idempotence, like the example above does not lead to duplicate consumption to create order message many times pre deduct the inventory, Refer to the previous article for idempotency of business operationsThe necessity of idempotent business operations under distributed systems.

The local message table scheme increases the cost of maintaining the message table for the business system, which makes the transaction processing part coupled with the business system and cannot be a general solution. In high concurrency, the read and write operation of the local message table will become the bottleneck of the system. Meanwhile, the scheduled task scanning the local message table will increase the delay between the systems.

2.2 Final consistency of reliable messages

The local message table scheme is too intrusive for a general purpose solution. By moving the processing of local messages to a separate service, you can have a universal solution for reliable message consistency.

The final consistency scheme of reliable message consists of upstream service, reliable message service and downstream service.

Reliable message service

Reliable messaging service an independent service that stores, delivers, and updates messages. Messages generally consist of to be confirmed, sent, cancelled, and completed.

  • To be confirmed

    An unacknowledged message sent by an upstream service to a reliable messaging service, which is acknowledged or cancelled by the upstream service after executing the business logic of the local transaction

  • Has been sent

    After the local transaction is successfully executed, the upstream service sends an acknowledgement message to the reliable message service. After receiving the message, the reliable message service updates the status of the message from to be acknowledged to sent

  • Has been cancelled

    When the upstream service fails to execute the local transaction, it sends a cancellation message to the reliable message service. After receiving the message, the reliable message service updates the status of the message from to confirm to canceled

  • Has been completed

    After the downstream service has consumed the message in MQ, it posts the completed consumption message to MQ, which is consumed by the reliable message service, updating the message status from sent to completed

The upstream service

The upstream service, the party that initiates the transaction, is generally the service that executes first in a distributed transaction. When the downstream service needs to be called, it is not directly called through RPC, but a message, and the specific steps are as follows:

  • Before the upstream service executes the business logic, it sends an unacknowledged message (usually called half-MSG, containing interface call information) to the trusted messaging service, which stores the record in its database (or local disk) with the status [unacknowledged] (steps 1 and 2 in the following figure).
  • The upstream service executes the business logic in the local transaction. If the local transaction succeeds, the side reliable message service sends a confirmation message. If local execution fails, a cancel message is sent to the messaging service (steps 3 and 4 in the following figure).
  • The reliable message service modifies the status of the corresponding message record in the local database to sent or Cancelled according to whether the received message is confirmed or cancelled. If an acknowledgement message is delivered to an MQ message queue at the same time, changing the message state and delivering MQ must be in a transaction that guarantees either success or failure (steps 5.1 and 5.2 in the figure below).

Note: To prevent the producer’s local transaction from executing successfully, but sending an acknowledgement/cancellation message times out. The reliable message service generally provides a background scheduled task to continuously scan the message status [to be confirmed] in the message table, and then call back an interface of the upstream service. The upstream service decides whether the message should be confirmed or cancelled, and sends the corresponding message.

The downstream service

Downstream services that subscribe to MQ messages and perform local transactions upon receipt of MQ messages. After successful execution, the message is ACK acknowledged, and the consumable message is delivered to MQ. Upon receipt of the message, the reliable message service updates the message status in the message table to completed, and then ACK the message.

The downstream service subscribes to MQ messages and executes the corresponding business logic after receiving MQ messages.

Note: To prevent repeated message consumption, downstream service business logic processing is idempotent. At the same time, exceptions may occur in steps 8, 9, and every 10 due to system or network reasons. On failure 8, the message is pushed again (mainstream MQ guarantees that the message is delivered at least once) and the downstream service business logic is idempotent. If step 9 fails, timed sent messages are scanned by a scheduled task pair in the reliable messaging service and reposted to MQ. If step 10 fails, the same message will be pushed again. If the message status in the message table is “Completed”, it will be good to ACK the message again directly. Refer to the previous article on the necessity of idempotent business operations in distributed systems for idempotent business operations.

2.3 Transaction Messages

Transactional messaging, also known as reliable message Ultimate Consistency, is supported by many open source messaging middleware such as RocketMQ and Kafka. At its core, the idea is the same as local message tables and trusted message services, but the trusted message services and MQ functions are wrapped together, masking the low-level details and making it easier for users to use.

RocketMQ already supports distributed transaction messages in 4.3.0, where RocketMQ uses the 2PC approach to commit transaction messages and adds a compensation logic to handle two-phase timeout or failure messages, as shown in the figure below.

The figure above illustrates the general scheme of transaction message, which is divided into two processes: normal transaction message sending and submission, transaction message compensation process.

  • Transaction message sending and submission

(1) The upstream service sends half messages to the MQ Server.

(2) MQ Server Server response message writing results (at this time the half message is not visible to downstream message subscribers).

(3) The upstream service performs a local transaction based on the sent result (if the write fails, the half message is not visible to the business and the local logic is not executed).

(4) Upstream services perform Commit or Rollback according to the local transaction status (Commit operation generates message index, message visible to consumers)

  • The compensation process

(5) For pending transaction messages that are not Commit/Rollback, MQ Server initiates a “back check” to the upstream service.

(6) When the upstream service receives the backcheck message, it checks the status of the local transaction corresponding to the backcheck message

(7) According to the local transaction status, the upstream service recommit or Rollback

The compensation phase is used to resolve the timeout or failure of message Commit or Rollback.

conclusion

This article summarizes the eventual consistency based on message delivery and consumption to achieve the main scheme of distributed transaction local messages table, reliable service, transaction information, these solutions are essentially turns the business operations of across systems reliable message delivery and consumption, to achieve the goal of the distributed transaction is split into multiple local affairs, and achieve the eventual consistency of data between systems.

Reference documentation

Distributed Transactions in distributed theory: Final consistency schemes for reliable messages

RocketMQ transaction message design

At the end

Original is not easy, praise, look, forwarding is a great encouragement to me, concern about the public number insight source code is my biggest support. At the same time believe that I will share more dry goods, I grow with you, I progress with you.