Source: https://chenmingyu.top/distributed-transaction/

takeaway

First in the public number: JAVA thief ship, wonderful articles, waiting for your attention! A share Java learning resources, practical experience and technical articles of the public number!

preface

A complex system is often developed from a small and simple system. In order to meet the growing business needs, the complexity of the system is constantly increased, and the single architecture is gradually developed into a distributed architecture. The design of distributed system architecture is mainly concerned with: high performance, high availability and high expansion

Distributed transaction

High availability refers to the performance of the system without interruption. It represents the availability of the system and is one of the criteria that must be followed in system design.

The implementation of high availability is nothing more than redundancy. In terms of high availability of storage, the problem is not how to back up data, but how to avoid the impact of data inconsistency on services

For distributed system, to ensure the consistency of data in the distributed system requires a scheme, which can ensure the consistency of data in the subsystem and avoid business problems. This implementation scheme is called distributed transaction, and it must be a holistic transaction, either succeed together or fail together

Here’s an example:

In the e-commerce sites, the user to place an order of goods, need to in order to create a order data in the table, at the same time need to modify the current goods in the inventory table, the number of remaining inventory two steps a add operation, a modified, we must ensure that the two step operation must be at the same time operating success or failure, or the business will be a problem

When set up:

The system is just a single architecture. The order table and the inventory table are all in the same database. In this case, local transactions of mysql can be used to ensure data consistency

Development:

With the rapid development of business and the increasing number of users, the performance bottleneck of single data has appeared. According to the business latitude, the database is divided into order database and inventory database. Due to cross-database and cross-machine, the local transaction of mysql can no longer guarantee the data consistency of order database and inventory database

Mature:

Business development, monomer structure already can’t satisfy the demand, and thus become a distributed system, then the orders and inventories have split two sub-systems: in order to provide services, using RPC communication between subsystems, but no matter what system develop, we must ensure that the business is not a problem, to ensure that the orders and inventory data consistent, How do we ensure data consistency across services

Theoretical basis

Before explaining the specific scheme, it is necessary to understand the theoretical basis that distributed data design should follow, including CAP theory and BACS theory, to pave the way for the subsequent practice

Theory of CAP

CAP: short for Consistency Acailability Partition tolerance

  • Consistency:

    For a client, a read operation can return the latest write operation result

  • Acailability: Availability

    A non-failing node returns a reasonable response in a reasonable amount of time

  • Partition tolerance: fault tolerance of partitions

    After a network partition, the system can continue to provide services. Do you know what a network partition is

Because the system in the distributed system must be deployed on multiple machines, the network cannot be guaranteed to be 100% reliable, so the network partition must exist, that is, P must exist.

After the emergence of network partitioning, the problem of availability and consistency arises, and we have to choose between the two, so there are two architectures: CP architecture, AP architecture;

CP architecture

When the network partition appears, in order to ensure consistency, the request must be rejected, otherwise the consistency cannot be guaranteed

  • If no network partition is created, the data of system A is the same as that of system B (X=1)
  • Change X of system A to 2, where X=2
  • When A network partition occurs, data fails to be synchronized between systems A and B, and X of system B is 1
  • When a client requests system B, system B rejects the service request and returns an error code or error information to ensure consistency

The above approach violates the requirements of availability and only meets the requirements of consistency and partition fault tolerance, namely CP

The CAP theory is that the network delay is ignored, and the network delay for synchronizing data from system A to system B is ignored

The CP architecture ensures that when the client obtains data, it must be the latest write operation or obtain abnormal information. Data inconsistency will never occur

AP architecture

When a network partition appears, system B can return the old value to ensure availability

  • If no network partition is created, the data of system A is the same as that of system B (X=1)
  • Change X of system A to 2, where X=2
  • When A network partition occurs, data fails to be synchronized between systems A and B, and X of system B is 1
  • When a client requests system B, in order to ensure availability, system B should return the old value. X=1 violates the consistency requirement and only meets availability and partition fault tolerance, that is, AP

The CP architecture ensures that the system is always available when the client retrieves data, regardless of whether the latest or old values are returned

CAP theory focuses on granularity of data rather than overall system design strategies

The BASE theory of

Eventual Consistency BASE theory refers to Basically Available Soft Stat and Eventual Consistency. The core idea is that even if strong Consistency cannot be achieved, an appropriate method should be adopted to ensure final Consistency

BASE: Basically Available Soft Stat short for Eventual Consistency

  • BA: Basically Available

    Distributed systems allow partial loss of availability in the event of failure, that is, to ensure the availability of the core

  • S: Soft Stat Indicates the Soft state

    An intermediate state is allowed without affecting the overall availability of the system

  • E: Consistency final Consistency

    All data copies in the system eventually reach a consistent state after a certain amount of time

BASE theory is essentially an extension of CAP theory and a supplement to AP scheme in CAP

Distributed transaction protocol

Premise: In the single architecture, the guarantee of transaction is realized by mysql, not by us. When the performance bottleneck of a single database appears, the database is divided into tables and libraries. According to the business latitude, the tables of order and inventory are divided into two libraries, order library and inventory library

X/Open the XA protocol

XA is a distributed transaction protocol proposed by Tuxedo. The XA specification primarily defines the interface between the (global) Transaction Manager and the (local) Resource Manager. An XA interface is a two-way system interface that forms a communication bridge between the Transaction Manager and one or more Resource Managers

The XA protocol uses a two-phase commit approach to manage distributed transactions. The XA interface provides a standard interface for communication between resource managers and transaction managers

2PC Two-phase commit protocol

Two-phase Commit refers to an Algorithm designed to ensure that all nodes in a distributed system Commit transactions consistently. Commonly, two-phase commit is also called a Protocol. In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of other nodes’ operation. When a transaction across multiple nodes, in order to keep the ACID characteristic of transaction, need to introduce a unity as coordinator component to control all the nodes (referred to as participants) operating results and eventually indicating whether the node should submit the operating results are real (for example, the updated data to disk, etc.). Therefore, the algorithm idea of two-stage submission can be summarized as follows: participants will inform the coordinator of the success or failure of the operation, and then the coordinator will decide whether to submit the operation or stop the operation according to the feedback information of all participants

The establishment of the two-stage submission algorithm is based on the following assumptions:

  • In this distributed system, one node serves as a Coordinator and the other nodes serve as Cohorts. The nodes can communicate with each other on the network.
  • All nodes use write-ahead logs, and the logs are stored on reliable storage devices after being written, even if the node is damaged.
  • All nodes are not permanently damaged and can be recovered even after damage

The two-stage submission is divided into two stages: the first stage: voting stage, the second stage: submission stage

Polling phase.

  • The coordinator asks all participants if they can commit and waits for responses from each participant
  • The participant performs the transaction and returns a Yes response on success or a No response on failure
  • If the coordinator accepts a timeout in the participant response, the transaction is also considered a failure

Commit phase commit

  • If all participants in the first phase of the pool return a YES response, the coordinator issues a commit request to all participants, and all participants commit the transaction
  • If one or more participants in the first phase return a no response, the coordinator issues a rollback request to all participants, and all participants rollback

Advantages of two-phase commit: Strong consistency of data is guaranteed as far as possible, but not 100%

Disadvantages:

  • A single point of failure

Due to the importance of the coordinator, once the coordinator fails, participants will block all the time, especially in phase 2, when the coordinator fails, all participants will be locked in transaction resources and unable to continue to complete the transaction

  • A synchronized block

Since all nodes block synchronously while performing operations, other third party nodes have to block access to the common resource while the participant occupies it

  • Data inconsistency

In the second phase, after the coordinator wants to send the commit transaction request to the participant, a local network exception occurs or the coordinator fails in the process of sending the commit transaction request, which causes only a portion of the participant to receive the commit transaction request. This part of the participants will perform the commit transaction operation after receiving the commit transaction request. However, other participants that do not receive a request to commit a transaction cannot commit a transaction. This results in data inconsistencies in distributed systems

Phase 2 submission of questions

If coordinator send submit requests after hanging up in the second stage, and the only accept to the participants after the execution of this message is to hang out, even if a coordinator through the agreement creates new coordinator and inform the other participants were committed or rolled back, may with the participants do not have to perform, When the lost participant recovers, data inconsistencies occur

3PC three phase commit protocol

Three phase commit (Three – phase commit), Three phase submission is in order to solve the faults of two-phase commit protocol | and design. Unlike two-phase commit, three-phase commit is a “non-blocking” protocol. Three stages in the first phase of the two-phase commit and inserted between the second phase of a preparation phase, make in two-phase commit, the original participants after the vote, as coordinator to crash or error, and lead to the participants is in no way of knowing whether to commit or produced by the “uncertainty” of the suspension may be quite long to solve time delay problem

There are three commit phases: CanCommit, PreCommit, and DoCommit

The query phase CanCommit

The coordinator sends a COMMIT request to the participant, who returns a Yes response if he can commit, or a No response otherwise

Preparation phase PreCommit

The coordinator determines whether to execute or interrupt a transaction based on the participant’s response during the query phase

  • If all participants return Yes, the transaction is executed
  • If one or more participants return No or time out, the transaction is interrupted

After performing the operation, the participant returns an ACK response and waits for the final instruction

Commit phase DoCommit

The coordinator determines whether to execute or interrupt a transaction based on the response of the participant during the preparation phase

  • If all participants return a correct ACK response, the transaction is committed
  • If one or more participants receive an incorrect ACK response or time out, the transaction is interrupted
  • If the participant is unable to receive a commit or interrupt transaction request from the coordinator in a timely manner, the transaction will continue to commit after the wait times out

The coordinator receives ACK responses from all participants and completes the transaction

Resolve issues with phase 2 commit

In phase 3 commit, if the phase 3 coordinator sends a commit request and hangs, and the only participant who accepts the commit also hangs, then the coordinator creates a new coordinator by election protocol. In the problem of two-phase commit is the new coordinator uncertainty has executed the participants of the transaction is to commit the transaction or interrupt the transaction, but when three phase commit, certainly got the second phase to confirm again, then the second stage is necessarily has right to perform the transaction, only waiting to commit the transaction, So the new coordinator can analyze the actions that should be performed from the second phase, commit or interrupt the transaction, so that the data is consistent even if the lost participant is recovered.

So, three phase commit solves the existing in the two phase commit due to the coordinator and participants hang up at the same time may lead to the problem of data consistency and single point of failure problem, and to reduce congestion, because once the participants cannot receive messages from the coordinator in time, he will be the default execution to commit the transaction, and does not always hold a transactional resource is in the blocking state.

Questions submitted in three stages

Is interruption in the commit phase if send transaction request, but due to network problems, lead to some participants were not received the request, so participants will commit transaction after waiting for a timeout, so they commit the transaction due to network problems of transaction request and receive interrupt data the participants of the participants has the problem of data inconsistency.

So neither 2PC nor 3PC can guarantee 100% consistency of data in a distributed system

The solution

Strongly consistent distributed transactions

In a single architecture with multiple data sources, in business development, the operation on the order library must be performed first, but the transaction is not submitted, and then the operation on the inventory library is performed, and the transaction is not submitted. If both operations are successful, the transaction is submitted together, and if one operation fails, both of them are rolled back

JTA based on 2PC/XA protocol implementation

We already know how the 2PC and XA protocols work, and JTA is the Java specification, an implementation of XA in Java

JTA(Java Transaction Manager) :

  • Begin (),rollback()… TransactionManager: common method for enabling, rolling back, and obtaining transactions.
  • XAResouce: resource management, transaction management through Session,commit(xID)…
  • XID: Each transaction is assigned a specific XID

JTA is the main principle of two-phase commit, and when the entire business to submit completed only after the first stage, in the second phase will check whether all other transactions before submission has been submitted, if the front an error or are not committed, then the second phase will not submit, but directly rolled back, so that all transactions will be rolled back

JTA scheme is used to achieve strong consistency of distributed transactions

JTA features:

  • Based on the two-phase commit, it is possible to have inconsistent data
  • The transaction is too long and blocked. Procedure
  • Low performance and throughput

Implementation can be based on JTA implementation jar package Atomikos use examples can be their own baidu

In normal architecture design should appear this kind of cross library operation, I think should not be, if the break up of the business data sources to depots, we should at the same time, the service also split out just right, should follow a system operating only one data source (master-slave it doesn’t matter), avoid subsequent may appear more system calls a data source

Final Consistent Distributed Transaction Scheme (Flexible Transaction)

JTA scheme is suitable for realizing distributed transactions with multiple data sources in a single architecture, but it is helpless for distributed transactions between micro-services, so we need to use other schemes to realize distributed transactions

Local message table

The core idea of local message table is to split distributed transactions into local transactions for processing

In the example in this article, add a new message table to the order system, place the new order and the new message in a transaction, then query the message table through polling, push the message to MQ, and the inventory system consumes MQ

Execution process:

  • Order system, add an order and a message, submit in a transaction
  • The order system, which uses scheduled task polling to query the message table for the unsynchronized status, sends it to MQ and retries it if it fails
  • The inventory system, receiving MQ messages and modifying inventory tables, needs to ensure idempotent operations
  • If the modification is successful, call the RPC interface to change the status of the order system message table to completed or delete the message directly
  • If the modification fails, you can wait for a retry

The messages in the order system may be repeatedly sent due to business problems, so in order to avoid this situation, we can record the sending times, and when the number of times limit is reached, the alarm will be raised and manual access will be processed. The inventory system needs to ensure idempotent to avoid data consistency caused by multiple consumption of the same message.

The solution of local message table realizes the final consistency. The message table needs to be added to the business system, and an additional DB operation needs to be inserted into the business logic, so the performance will be lost, and the interval of the final consistency is mainly determined by the interval time of the scheduled task

MQ message transaction

The principle of message transactions is to decouple two transactions asynchronously through messaging middleware

The order system performs its own local transactions and sends MQ messages, while the inventory system receives messages and performs its own local transactions. At first glance, it looks similar to the implementation of the local message table, except that the operation of the local message table and the operation of polling to send MQ are omitted, but the implementation of the two schemes is not the same

The message transaction must ensure that the business operation is consistent with the message sent. If the business operation succeeds, the message must also be delivered successfully

Message transactions depend on transactional messages from messaging middleware, which RocketMQ supports based on the two-phase commit implementation of messaging middleware

Execution process:

  • Sends a prepare message to the message middleware
  • After successful sending, the local transaction is executed
  • If the transaction succeeds, commit, and the messaging middleware delivers the message to the consumer
  • If the transaction fails, it is rolled back and the message middleware deletes the prepare message
  • The consuming end receives the message and consumes it. If consuming fails, it tries again

This scheme also achieves final consistency. Compared with the local message table implementation scheme, there is no need to build the message table and no longer rely on local database transactions. Therefore, this scheme is more suitable for high concurrency scenarios

Best effort notice

Compared with the previous two schemes, maximum effort notification is simple to implement and is suitable for some services with low final consistency requirements, such as payment notification and SMS notification

In payment notice, for example, calling payment platform of business system, payment platform to pay, to operate payment platform, will try to notice after payment business system payment operation is successful, but there will be a maximum number of notification, or if more than the number of times after notice fail, no longer notice, the business system to invoke the payment platform provides a query interface, For the business system to check whether the payment operation is successful

Execution process:

  • The business system invokes the payment interface of the payment platform and records it locally. The payment status is in payment
  • After the payment platform conducts the payment operation, whether it succeeds or fails, it needs to notify the business system of the result
  • If the notification fails for a long time, the system tries again according to the retry rules. The system will not be notified when the maximum number of notifications is reached
  • The payment platform provides an interface to query the operation results of order payment
  • The business system queries the payment result on the payment platform according to certain business rules

This scheme also achieves final consistency

Compensation transaction TCC

TCC try-confirm-Cancel is short for TCC try-confirm-Cancel. For each operation, there is a corresponding Confirm and Cancel operation. Confirm operation is called when the operation succeeds, and Cancel operation is called when the operation fails. Therefore, distributed transactions based on TCC implementation can also be regarded as a compensation mechanism for business

The three stages of TCC:

  • In the Try phase, the system checks and reserves resources for the service system
  • Confirm phase: The Confirm phase does not fail when the Try phase succeeds and the Confirm phase starts. If the Try succeeds, Confirm succeeds
  • Cancel phase: When a service execution error occurs and needs to be rolled back, the service is cancelled and the reserved resources are released

In the try stage, the business system is checked and resource preview, such as order and storage operations. It is necessary to check whether the remaining inventory quantity is sufficient and reserve. The reserve operation is to create an available inventory quantity field, and the operation in the try stage is to operate the available inventory quantity

For example, the next order minus one stock:

Execution process:

  • Try stage: The order system sets the current order state as payment, the inventory system verifies whether the current remaining inventory quantity is greater than 1, and then sets the available inventory quantity to the remaining inventory quantity -1.
  • If the Try phase is successfully executed, the Confirm phase is executed to change the order status to payment success and the remaining inventory quantity to available inventory quantity
  • If the execution of the Try phase fails, perform the Cancel phase to change the order status to payment failure and the quantity of available inventory to the quantity of remaining inventory

To realize distributed transaction based on TCC, the code logic is more complex, so it is necessary to split the logic of the original interface into three interfaces: try, Confirm and cancel

Distributed transaction framework based on TCC implementation: ByteTCC, TCC-Transaction

ByteTCC:https://github.com/liuyangming/ByteTCC

tcc-transaction:https://github.com/changmingxie/tcc-transaction

After reading it, we should have a general understanding of distributed transactions. In actual production, we should try to avoid using distributed transactions, and use local transactions if we can transform them into local transactions. If we must use distributed transactions, we also need to think more about which scheme is more suitable for business

Always think before you act