preface

In this article, we will introduce how to solve the problem of “distributed transaction problem”. In this article, we will introduce the problem of “distributed transaction problem”.

What are distributed transactions?

Let’s review the four characteristics of transactions:

  • A(Atomicity) — All transactions in an atomic transaction are integrated, and all events that complete A transaction either succeed or fail.

  • C(consistency) — The transaction read before the execution of the transaction must be the data before the update, and the data after the execution must be the updated data. Data in the process of the transaction is not allowed to be read, that is to say, the intermediate state of the transaction or the completed part of the transaction cannot be read by users. Because the data is inconsistent during transaction execution.

  • I(Isolation) — Isolated transactions and transactions cannot be linked together during execution because they become persistent and stable after the transaction is committed, which brings the risk of inconsistencies.

  • D (Durability) – Data is stable from other causes after transactions are successfully executed.

ACID is relatively easy to achieve in a single library transaction, but difficult to achieve in a distributed library because the events in a transaction are in different systems and the interaction between systems is complex and costly. Let’s take a look at what happens to the four main features of transactions in distributed libraries:

  • Distributed atomicity guarantees that it can be done, all or nothing.

  • Distributed consistency requires that data be consistent before and after a transaction, and not be read by users in the middle. You can use the form of persistent log, log recording is used after the sub-events of different databases are completed, and then commit to the library after all the sub-events are completed. It needs to see whether the sub-transactions of each system are completed, and then the library is completed. When all subsystems are put into the library, the transaction ends.

    You can see that logging has been introduced; Subsystems need to interact with each other; Unpredictable failures occur in the final commit phase, which requires retries, etc. The final commit subsystems are not exactly simultaneous, so it is still possible to read intermediate data.

  • Distributed isolation

    • Read committed, “distributed consistency” above can be done roughly.
    • Mysql MVCC allocates trx_id and implements undolog and readView.
    • Mysql Next-key can also refer to mysql Next-key according to the range of different level sub-tables to lock (generally not so, one is to lock all sub-tables and so on all locked to ensure that, if there is a large gap in network fluctuation can not guarantee not phantom read; Next-key is index-dependent, so it can’t be guaranteed without an index. You can use distributed locks on your own designed middleware instead of mysql next-key.
  • Distributed persistence to disk is definitely possible.

Let’s take a look at some ideas and approaches for solving distributed transactions.

CAP

It is impossible for a distributed computing system to simultaneously satisfy the following three characteristics:

  • Consistency: All nodes access the same latest data copy.
  • Availability: a very good response will be obtained with each request — but there is no guarantee that the data acquired will be up-to-date.
  • Partition tolerance: In practical effect, partitioning is equivalent to a time limit requirement for communication. The failure of the system to achieve data consistency within the time limit means that A partitioning situation has occurred and the system must choose between C and A regarding the current operation.

According to the theorem, a distributed system can satisfy only two of three terms but not all three. Allowing a node to update its state results in data inconsistencies, i.e., the loss of C property. If the nodes on one side of the partition are made unavailable to ensure data consistency, then the A property is lost. Unless two nodes can communicate with each other, C and A can be guaranteed, and this will lose the P property again.

  • Abandon P: distributed network communication cannot satisfy C, so P must be satisfied.
  • AP: partitions are allowed to occur within a certain period of time, but C cannot.
  • CP: then the user does not have access to some data and loses availability.

Unavailability is not tolerated, so AP, the final consistency, is generally chosen.

X/Open DTP

X/Open DTP(X/Open Distributed Transaction Processing Reference Model).

Components include:

  • Applications (AP)
  • Transaction Manager (TM)
  • Resource Manager (RM)

X/Open XA

In computing technology, the XA specification is an open group specification for distributed processing (DTP). The specification describes the interface between global transaction managers and local resource managers. The purpose of the XA specification is to allow multiple resources (such as databases, application servers, message queues, and so on) to be accessed in the same transaction so that ACID properties remain valid across applications. XA uses two-phase commit to ensure that all resources commit or roll back any particular transaction at the same time.

The XA specification describes what a resource manager must do to support transactional access. Resource managers that comply with this specification are called XA Compliant.

2PC- Phase 2 commit

Two-phase Commit, an algorithm designed to make transactions Commit consistently among all nodes based on distributed system architecture. When a transaction spans multiple nodes, in order to maintain the ACID nature of the transaction, a component needs to be introduced as a coordinator to know the results of all nodes (called actors) and ultimately to indicate whether or not they want to actually commit the results.

Therefore, the algorithm idea of two-stage submission can be summarized as follows: participants will inform the coordinator of the success or failure of the operation, and then the coordinator will decide whether each participant will submit the operation or terminate the operation according to the feedback of all participants.

The premise

  • In this distributed system, there exists one node as coordinator and other nodes as participants. The nodes can communicate with each other on the network.
  • All nodes use write-ahead logs, and the logs are stored on reliable storage devices after being written. Even if the nodes are damaged, the log data will not be lost.
  • All nodes are not permanently damaged and can be recovered even if damaged.

The basic algorithm

Phase 1 (Submit request phase)

  1. The coordinator node asks all the participant nodes if they can commit and waits for the response from each participant node.
  2. The participant performs all transactions until the query is initiated and writes Undo and Redo information to the log.
  3. Each participant node responds to the query initiated by the coordinator node. If the transaction of the participant node actually succeeds, it returns an “agree” message; If the participant’s transaction fails, a termination message is returned.

Phase 2 (commit execution phase) success (all participants sent “agree”) :

  1. The coordinator node issues a “formally commit” request to all nodes.
  2. The participant node formally completes the operation and releases the resources that were held during the entire transaction.
  3. The actor node sends a done message to the coordinator node.
  4. The coordinating node completes the transaction after receiving a “done” message from all the participant nodes.

Failure (as long as one participant sends “terminate”) :

  1. The coordinator sends a rollback action request to all the participant nodes.
  2. The participant uses the previously written Undo information to perform a rollback and release the resources occupied during the entire transaction.
  3. The participant node sends a rollback complete message to the coordinator node.
  4. The coordinator node cancels the transaction after receiving a rollback complete message from all the participant nodes.
Coordinator participants QUERY TO COMMIT -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > VOTE YES/NO prepare * / abort * < -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - commit*/abort* COMMIT/ROLLBACK --------------------------------> ACKNOWLEDGMENT commit*/abort* <-------------------------------- endCopy the code

The operations marked with “*” mean that such operations must be recorded on solid storage.[1]

The coordinator here should be TM in the DTP model and the participant is RM.

disadvantages

As you can see, phase 2 waits for all participants to return messages. What if the network is delayed? What if the participants fail? Resources held by participants are never released. If a participant never returns, the timeout can only be rolled back using the coordinator’s timeout mechanism.

Coordinators can also fail and need to consider high availability. At the same time:

  1. Phase 1 send failure. Some AP messages (greater than or equal to 0 but less than all) have been received. It should be considered to log the current transaction (persistent), so that after the election of a new coordinator can continue to receive messages from the remaining APS and receive messages from the APS in the transaction.

  2. Phase 2 send failures, where messages have been sent to some (greater than or equal to zero but less than all) participants, should consider logging (persistent) the current transaction so that messages can continue to be sent to participants for submission or rollback after the election of a new coordinator. (In this case, continuing to send depends on the logs of the first phase to know which participants are involved.)

Note that the election recovery mentioned above takes place within a timeout, and the participants should also have a timeout.

3PC- Three-phase commit

This is to increase validation of the status of the participants, detect problems early, and end the process early if a participant is found to be unavailable.

TCC

TCC is try-confirm-cancel. The process of TCC is different from that of 2PC. 2PC relies on the database for the rollback of the submission.

For example, when a user buys 1,000 yuan of goods on an e-commerce site, he pays 800 yuan with the balance and 200 yuan with a red envelope.

Try the operation

  • TryX order system creates orders to be paid
  • TryY frozen account red envelope 200 yuan
  • TryZ has a frozen fund account of $800

Confirm operation

  • ConfirmX order updated as payment was successful
  • ConfirmY deducts 200 yuan from the account red envelope
  • ConfirmZ deducts $800 from the capital account

Cancel the operation

  • CancelX order processing exception, fund red envelope returned, order payment failed
  • CancelY failed to freeze red envelope, account balance returned, order payment failed
  • CancelZ failed to freeze the balance, the account red envelope was returned, and the order payment failed

Freezing the red envelope/balance can be done with the database subtract operation + log (record the order number). Here, the order number needs to be retained. For example, if the order processing is abnormal, the red envelope and the balance need to be rolled back according to the order number.

Transaction message

Using message queues such as RocketMQ.

As shown in the figure:

  • A needs to send A “submit” message to the middleware after processing the task successfully, indicating that the previously published message can be consumed. Here, system A needs to provide A task status query interface to MQ, because there will be A situation that System A has not sent A “submit” message.
  • System A will deliver the message to system B when it is finished, but system B will repeat the message if there is no response queue (system B needs to be idempotent).

reference

The CAP theorem

X/Open Distributed transaction processing model

X/Open XA

Two-stage submission

Three-stage commit

CAP, Base theory and distributed transactions — 2PC, 3PC and TCC