Seata distributed transaction analysis of several common patterns

Cabbage Java self study room covers core knowledge

1. Distributed transaction protocol

There are also specifications and protocols for dealing with distributed transactions. The protocols related to distributed transactions are 2PC and 3PC.

1.1. (2PC) Two-phase commit protocol

A two-phase Commit (2PC) is a process by which a Coordinator is introduced to coordinate the actions of participants and ultimately determine whether these participants will actually perform transactions.

1.1.1. Preparation phase

The coordinator asks the participant if the transaction executed successfully, and the participant sends back the result of the transaction execution.

1.1.2. Submission phase

If the transaction executes successfully on each participant, the transaction coordinator sends a notification for the participant to commit the transaction; Otherwise, the coordinator sends a notification to the participant to roll back the transaction. It is important to note that in the preparation phase, the participant executes the transaction but has not committed it yet. Commit or rollback occurs only after notification from the coordinator is received during commit phase.

1.1.3. Existing problems

Synchronous blocking: All transaction participants are synchronously blocked while waiting for responses from other participants.
Single point of problem: The coordinator plays a big role in 2PC, and failure can have a big impact. In particular, when phase two fails, all participants are kept in a waiting state, unable to complete other operations.
Data inconsistency: In phase 2, if the coordinator sends only part of the Commit message, and an exception occurs on the network, only part of the participants receive the Commit message, that is, only part of the participants Commit the transaction, making the system data inconsistent.
Too conservative: The failure of any node will result in the failure of the entire transaction. There is no perfect fault tolerance mechanism.

1.2. (3PC) Three-phase commit protocol

Three-phase Commit (2PC). Unlike two-phase Commit, three-phase Commit has two change points.

Introduce timeouts. Introduce timeouts for both the coordinator and the participant.
Insert a preparation phase between phases 1 and 2. The state of each participating node is consistent before the final submission stage.

1.2.1. CanCommit stage

The CanCommit phase for 3PC is actually very similar to the preparation phase for 2PC. The coordinator sends a COMMIT request to the participant, who returns a Yes response if he can commit, or a No response otherwise.

The transaction asks the coordinator to send a CanCommit request to the participant. Asks if a transaction commit operation can be performed. It then waits for the participant’s response.
After receiving a CanCommit request, the participant normally returns a Yes response and goes into the preparatory state if it thinks it can execute the transaction successfully. Otherwise, feedback No.

1.2.2. PreCommit stage

The coordinator decides whether to proceed with the PreCommit operation of the transaction based on the response of the participant. Depending on the response, there are two possibilities. If the coordinator receives a Yes response from all participants, the transaction is pre-executed.

Send a PreCommit request The coordinator sends a PreCommit request to the participant and enters the Prepared phase.
Upon receipt of a PreCommit request, a transaction is performed and undo and redo information is recorded in the transaction log.
Response Feedback If the participant successfully executes the transaction, an ACK response is returned and the participant waits for the final instruction.

If either participant sends a No response to the coordinator, or if the coordinator does not receive a response from the participant after a timeout, the transaction is interrupted.

Sending interrupt Requests The coordinator sends abort requests to all participants.
An interrupt transaction participant performs an interrupt of the transaction after receiving an ABORT request from the coordinator (or after a timeout and still no request from the coordinator).

1.2.3. DoCommit stage

The actual transaction commit at this stage can also be divided into the following two scenarios.

commit

Send submit request: The coordinator receives an ACK response from an actor, and he moves from the pre-commit state to the commit state. DoCommit requests are sent to all participants.
Transaction commit: After receiving the doCommit request, the participant performs a formal transaction commit. All transaction resources are released after the transaction commits.
Response feedback: After the transaction commits, an Ack response is sent to the coordinator.
Completion of transaction: The coordinator completes the transaction after receiving ack responses from all participants.

Interrupt the transaction

The coordinator does not receive the ACK response sent by the participant (either the recipient sent a non-ACK response, or the response timed out), then the interrupt transaction is executed.

Send interrupt requests: The coordinator sends abort requests to all participants
Transaction rollback: After an ABORT request is received, the participant uses the undo information recorded in phase two to roll back the transaction and release all transaction resources upon completion of the rollback.
Feedback result: After the participant completes the transaction rollback, an ACK message is sent to the coordinator
Interrupt transaction: After the coordinator receives an ACK message from the participant, the interrupt of the transaction is performed.

2. The AT mode

AT pattern is a non-intrusive distributed transaction solution.

Ali Seata framework, to achieve this pattern. In AT mode, users only need to focus on their own “business SQL” as a phase, and Seata framework will automatically generate two-phase commit and rollback operations for transactions.

How does THE AT mode ensure non-intrusion to services?

2.1. A phase

In the first phase, Seata intercepts “business SQL”, parses SQL semantics, finds the business data to be updated by “Business SQL”, saves it as “Before image” before the business data is updated, and then executes “Business SQL” to update the business data. After the business data is updated, Save it as “After Image” and generate a row lock. All of the above operations are done within a single database transaction, which ensures atomicity of the one-phase operations.

2.2. Two-phase commit

Seata framework only needs to delete the snapshot data and row locks saved in the first phase to complete data cleaning because the “business SQL” has been committed to the database in the first phase.

2.3. Phase-two Rollback

In phase 2 rollback mode, Seata needs to roll back the “business SQL” executed in phase 1 to restore the business data. The rollback method is to use “before Image” to restore service data. However, check dirty write data before restoring the database. Compare current service data in the database with After Image. If the two data files are identical, there is no dirty write data and the service data can be restored.

Phase ONE, phase two commit and rollback of AT mode are automatically generated by Seata framework. Users can easily access distributed transactions by writing “business SQL”. AT mode is a distributed transaction solution without any intrusion on business.

3. The TCC mode

In TCC mode, users need to perform the Try, Confirm, and Cancel operations based on their own service scenarios. The transaction initiator performs the Try method in phase 1, the Confirm method in phase 2, and the Cancel method in phase 2.

TCC three methods are described:

Try: detects and reserves resources. Confirm: indicates that the service operation is submitted. Try Confirm it must be successful. Cancel: releases reserved resources.

3.1. Business model

The most important thing for users to access TCC is to consider how to split their business model into two phases.

Take the “money deduction” scenario as an example. Before accessing TCC, the money deduction from account A can be completed by updating the account balance in SQL. However, after accessing TCC, users need to consider how to separate the money deduction operation that can be completed in one step into two stages and realize three methods, and ensure that if the first stage Try succeeds, the second stage Confirm will succeed.

As shown in the figure above, the Try method, as a phase preparation method, needs to check and reserve resources. In the deduction scenario, what Try should do is to check whether the account balance is sufficient and reserve the transfer funds by freezing the transfer funds from account A. After the Try method is executed, account A still has A balance of 100, but $30 of it is frozen and cannot be used by other transactions.

The two-stage Confirm method performs the actual deduction operation. Confirm will use the funds frozen in the Try phase to perform account deduction. After the Confirm method is executed, the 30 yuan frozen in the first stage of account A has been deducted, and the balance of account A has become 70 yuan.

If phase 2 is A rollback, you need to release the 30 yuan frozen in phase 1 Try in Cancel method, so that account A can return to its original state, and all 100 yuan will be available.

The most important thing for users to access TCC mode is to consider how to split the business model into two phases and implement the three methods of TCC, and ensure that Try Confirm will be successful. Compared with AT mode, TCC mode is more intrusive to business code, but TCC mode does not have global row lock in AT mode, so TCC performance is much higher than AT mode.

3.2. Empty rollback is allowed

The Cancel interface is designed to allow empty rollback. If the Try interface does not receive the packet due to packet loss, the transaction manager will trigger the rollback, and then the Cancel interface will be triggered. In this case, if the Cancel execution finds no corresponding transaction XID or primary key, the rollback is successful. Make the transaction service manager think it has been rolled back, otherwise it will keep trying again, and Cancel has no corresponding business data to roll back.

3.3. Anti-suspension control

Suspension means: Cancel executes before Try because the Try timed out due to network congestion, the transaction manager rolls back, triggers Cancel, and finally receives a Try call with Cancel arriving before Try. According to the front to allow air to rollback logic, rollback returns successfully, the transaction manager that the transaction has been rolled back,, Try interface at this time should not be executed, otherwise will produce inconsistent data, so we in the Cancel empty rollback transaction returns success before you record this xid primary keys, or business logo this record has been rolled back, The Try interface checks the transaction XID or service primary key and does not perform the Try operation if the transaction xID or service primary key has been marked as rollback successfully.

3.4. Idempotent control

Idempotent means that a single request and repeated requests have the same effect on system resources under the same conditions. Because the network jitter or congestion may timeout, the transaction manager will retry the operation of resources, so probably a business operations will be repeated calls, in order not to because of repeated calls to multiple occupancy resources, need to control the power such as service design, usually we can use a transaction xid or business primary key to weight control.

4. Saga mode

Saga theory comes from Hector & Kenneth’s 1987 paper Sagas.

The saga pattern is implemented as a long transaction solution. Saga is a compensation protocol. In Saga mode, there are multiple participants in distributed transactions, and each participant is a positive compensation service, requiring users to implement forward and reverse rollback operations according to business scenarios.

In the process of distributed transaction execution, the forward operations of each participant are successively executed. If all forward operations are successfully executed, the distributed transaction is committed. If any of the forward operations fail, the distributed transaction falls back to perform the reverse rollback of the previous participants, rolling back the committed participants and returning the distributed transaction to its initial state.

Saga forward services and compensation services also need to be implemented by business developers. Therefore, it is a service intrusion.

Distributed transactions in Saga mode are usually event-driven and executed asynchronously among participants. Saga mode is a long transaction solution.

4.1. Usage Scenarios

Saga mode is suitable for business systems with long business processes that need to ensure the final consistency of transactions. In Saga mode, local transactions will be submitted in one stage, and performance can be guaranteed in the case of no lock and long process.

Transaction participants may be services of other companies or legacy systems that cannot be modified and provide the interfaces required by TCC. Saga mode can be used.

Advantage:

One-stage commit local database transaction, no lock, high performance;
Participants can use transaction-driven asynchronous execution with high throughput;
Compensation service is the “reverse” of forward service, which is easy to understand and realize.

Disadvantages: In Saga mode, isolation is not guaranteed because local database transactions have been committed in one phase and no “reserved” action has been taken. We’ll talk about what to do about the lack of isolation.

Similar to TCC practice, in Saga mode, each transaction participant’s forward and reverse operations need to be supported:

4.2. Empty compensation is allowed

Allow null compensation: the compensation service was executed when the original service was not executed.

4.3. Anti-suspension control

Anti-suspension control: compensation service is executed before the original service, and forward operation should be rejected after empty compensation;

4.4. Idempotent control

Idempotent control: original service and compensation service guarantee idempotent;

4.5. Customize transaction recovery policies

Custom transaction recovery strategy:

As mentioned earlier, the Saga mode does not guarantee transaction isolation, and dirty writes can occur in extreme cases. For example, in the case of uncommitted distributed transactions, the data of the previous service is modified, and the later service needs to be rolled back when an exception occurs. It may be that the data of the previous service cannot be compensated after modification. One solution at this point may be to “retry” to continue the distributed transaction. Because the entire business process is choreographed by the state machine, even a post-mortem recovery can continue and retry forward. Therefore, you can configure a rollback or retry policy based on service characteristics. When a transaction times out, the Server retries the transaction based on this policy.

Because of Saga does not guarantee that isolation, so we need to do at the time of business design “would rather long, not short money” principle, long refers to the some mistakes when standing in the point of view of our much more money, less money is short, because if long can give customers a refund, but brief paragraph may be money don’t come back, that is to say, at the time of business design, It must be billed to the customer before Posting. If coverage is updated due to isolation problems, there will be no loss of money.

5. The XA mode

XA is a two-phase commit protocol defined by the X/Open DTP Organization (X/Open DTP Group). XA is nlocally-supported by many databases (such as Oracle, DB2, SQL Server, MySQL) and middleware tools (such as CICS and Tuxedo).

X/Open DTP model (1994) includes application (AP), transaction manager (TM), resource manager (RM). The XA interface functions are provided by the database vendor. The basis of the XA specification is the two-phase commit protocol 2PC. JTA(Java Transaction API) is an enhanced version of the XA specification implemented in Java.

In XA mode, you need to have a [global] coordinator, and after each database transaction completes, you do a phase 1 pre-commit, notify the coordinator, and give the results to the coordinator. After all branch transaction operations such as coordinator are completed and all pre-committed, the second step is carried out. Step 2: The coordinator tells each database to commit/rollback one by one. The global coordinator is the TM role in the XA model, and the respective database for each branch transaction is RM.

Atomikos is an open source framework in XA mode, and the company behind it also has a commercial version.

Disadvantages of XA mode: Large transaction granularity. With high concurrency, system availability is low. Therefore, it is rarely used.

6. Mode comparison (AT, TCC, Saga, XA)

Four distributed transaction modes were proposed at different times, each of which has its own applicable scenarios:

AT mode is a non-invasive distributed transaction solution, suitable for scenarios that do not want to transform services, with almost zero learning cost.
TCC mode is a high performance distributed transaction solution, which is suitable for scenarios with high performance requirements such as core systems.
Saga mode is a long transaction solution, which is suitable for business systems with long business processes and the need to ensure the final consistency of transactions. Saga mode will submit local transactions in one stage, without locking. Performance can be guaranteed in the case of long processes, and it is mostly used for business systems at the channel layer and integration layer. Transaction participants may be services of other companies or legacy systems that cannot be modified and provide the interfaces required by TCC, or the Saga pattern can be used.
XA pattern is a distributed solution with strong consistency, but low performance and low usage.