preface
Because BASE theory requires a trade-off between consistency and usability, many algorithms and protocols for consistency have emerged. Among them, there are 2 Phase Commitment Protocol, 3 Phase Commitment Protocol and Paxos algorithm.
The 2PC protocol introduced in this article commits a transaction in two phases. And through the coordination of the coordinator and each participant to achieve distributed consistency.
A two-phase transaction commit protocol, done jointly by the coordinator and the participant.
role | XA concept | role |
---|---|---|
The coordinator | Transaction manager | Coordinate various participants to commit or roll back distributed transactions |
participants | Resource manager | Nodes in a distributed cluster |
The body of the
1. Distributed transactions
Distributed transactions refer to transactions involving operations on multiple databases, which extends the concept of transactions for the same library to transactions for multiple libraries. The goal is to ensure data consistency in distributed systems.
The key to distributed transaction processing is:
- You need to record all actions taken by a transaction at any node;
- All operations performed by a transaction are either committed or rolled back.
2. The XA specification
2.1. Composition of the XA specification
The XA specification is a distributed transaction processing model defined by the X/Open organization (now the Open Group). The X/Open DTP model (1994) includes:
- Applications (AP)
- Transaction Manager (TM) : Transaction middleware, etc
- Resource Manager (RM) : Relational database, etc
- Communication Resource Manager (CRM) : message middleware, etc
2.2. Definition of XA specification
The XA specification defines the interface specification (interface function) between the transaction middleware and the database, which the transaction middleware uses to inform the database of the start, end, commit, rollback, and so on of a transaction. The XA interface functions are provided by the database vendor.
The second-order commit protocol and the third-order commit protocol are proposed based on XA specification in which the two-stage commit is the key to implement XA distributed transaction.
2.3. XA specification Programming specification
-
Configure TM and register RM with TM as the data source. A TM can register multiple RMS.
-
AP initiates a global transaction to TM. At this point, TM sends an XID (global transaction ID) to notify each RM.
-
The AP gets the agent of the resource manager from the TM (for example, using the JTA interface to get the JDBC connection or JMS connection of the TM-managed RM from the tM-managed context).
-
The AP indirectly operates RM through the connection obtained from TM. TM passes the XID to RM on each AP operation, and it is through this XID association that RM manipulates the relationship with the transaction.
-
When the AP ends a global transaction, TM notifies RM that the global transaction ends. Start the second commit, which is the prepare-commit process.
The flow of the XA specification is roughly as follows:
3. Phase 2 Submission (2PC)
3.1. Definition of two-phase commit
The algorithm idea of two-stage submission can be summarized as follows: each participant will inform the coordinator of the success or failure of the operation, and then the coordinator will decide whether to submit the operation or stop the operation according to the feedback information of all participants.
The so-called two stages are:
- Stage 1: Preparation stage (voting stage)
- Phase 2: Submission phase (Implementation phase)
3.1.1. Preparation phase
The preparation phase is divided into three steps:
A. Transaction inquiry
The coordinator asks all participants if they are ready to execute the transaction and waits for responses from each participant.
B. Execute transactions
Each participant node performs transaction operations. If the local transaction succeeds, Undo and Redo information is recorded in the transaction log, but not committed. Otherwise, a failure message is displayed and the execution exits.
C. Each participant feedback the response to the transaction query to the coordinator
If the participant successfully performs the transaction, the coordinator is given a Yes response indicating that the transaction can perform the commit; If the participant does not successfully execute the transaction, No is returned to the coordinator, indicating that the transaction cannot perform the commit.
3.1.2. Submission phase
In the commit phase, two actions are performed based on the results of the vote in the prepare phase: execute the transaction commit and interrupt the transaction.
The process of committing a transaction is as follows:
A. Send a submission request
The coordinator issues commit requests to all participants.
B. Transaction submission
After receiving the commit request, the participant formally performs the transaction commit and, upon completion of the commit, releases the transaction resources occupied during the entire transaction execution.
C. Feedback transaction submission results
The participant sends an Ack message to the coordinator after completing the transaction commit.
D. Transaction submission confirmation
After receiving Ack information from all participants, the coordinator completes the transaction.
The interrupt transaction process is as follows:
A. Send a rollback request
The coordinator issues a Rollback request to all participants.
B. Transaction rollback
After receiving the Rollback request, the participant uses the Undo information recorded during the commit phase to perform the transaction Rollback. After the rollback is complete, the resources occupied during the entire transaction execution are released.
C. Feedback transaction rollback results
The participant wants the coordinator to send an Ack message after completing the transaction rollback.
D. Transaction interrupt confirmation
After receiving Ack information from all participants, the coordinator completes the transaction interruption.
3.1. Advantages and disadvantages of two-phase commit
- Advantages: simple principle, convenient implementation.
- Disadvantages: synchronization blocking, single point of problem, inconsistent data, poor fault tolerance.
A synchronized block
In the two-phase commit process, all nodes are waiting for responses from other nodes and cannot perform other operations. This synchronization blocking greatly limits the performance of distributed systems.
A single point of the problem
The coordinator is important throughout the two-phase commit process, and if the coordinator fails during the commit phase, the entire process will not work. More importantly, other participants will be in a state where the transaction resources are locked and will not be able to continue to complete the transaction.
Data inconsistency
Suppose that after the coordinator sends commit requests to all participants, a local network exception occurs, or the coordinator crashes itself before all commit requests are sent, resulting in only some participants receiving commit requests. This leads to serious data inconsistencies.
Poor fault tolerance
If in the submission query stage of two-stage submission, the participant fails to obtain the confirmation information of all participants, the coordinator can only rely on its own timeout mechanism to determine whether the transaction needs to be interrupted. Clearly, this strategy is too conservative. In other words, the two-phase commit protocol does not have a well-designed fault tolerance mechanism, and the failure of any node will lead to the failure of the whole transaction.
summary
A solution to the synchronization blocking and single point problems of the 2PC protocol will be introduced in the 3PC protocol in the next article.
A link to the
- Distributed theory (I) – CAP theorem
- Distributed theory (II) – BASE theory
- Distributed theory (III) – 2PC protocol
- Distributed theory (IV) – 3PC protocol
- Distributed theory (v) – Consistency algorithm Paxos
- Distributed theory (VI) – Consistency protocol Raft
Welcome to scan code to pay attention to the public number: Zero one technology stack
This account will continue to share learning materials and articles on back-end technologies, including virtual machine basics, multithreaded programming, high-performance frameworks, asynchronous, caching and messaging middleware, distributed and microservices, architecture learning and progression.