In a distributed system, when a transaction needs to span multiple distributed nodes, a Coordinator appears to uniformly schedule the execution logic of all distributed nodes, and the scheduled distributed nodes are called “Cohort”, in order to maintain the ACID feature of the transaction. A Coordinator is in charge of scheduling Cohort activities and deciding whether they are actually committed. Based on this idea, 2PC and 3PC protocols are derived.

1.2PC (Two-Phase Commit)

2PC(Two-phase Commit) is a two-phase Commit algorithm designed to preserve atomicity and consistency of all node operation transactions in a distributed system. The protocol is mainly used in relational databases to complete distributed transaction processing. Coordinators can be used to implement unified transaction submission and rollback, thus effectively ensuring the consistency of distributed data.

Two stages of 2PC

Phase one (Submit request phase)

Cohort vote on whether to proceed with the next transaction submission:

  1. A Coordinator node asks all Cohort nodes whether they can submit information and waits for a response from each node.
  2. The participant (Cohort) node performs all transaction operations until query initiation, and writes Undo information and Redo information to the log.
  3. Each Cohort node responds to queries initiated by a Coordinator node. If transactions on the Cohort node are actually carried out successfully, it will return a “agree” message; If transactions on a Cohort node actually fail, it will return a “stop” message.

Phase 2 (Submission execution Phase)

A Coordinator will determine whether transactions can be submitted based on feedback from participants (Cohort), which can be divided into two cases: transaction submission and transaction interruption.

Transaction commit:

If a Coordinator node receives “Yes” messages from all Cohort nodes:

  1. A Coordinator node sends a “Commit” request to all Cohort nodes.
  2. After receiving a Commit request, a participant (Cohort) node formally performs transactions and releases resources occupied during the whole transaction period.
  3. A Cohort node should send an Ack message to a Coordinator node to confirm that it is complete.
  4. A Coordinator node will complete a transaction after receiving an Ack message from all Cohort nodes.

12. Transaction interruption: If a response message from any participant (Cohort) node is “No” in phase 1, or a Coordinator node is unable to obtain response messages from all participants (Cohort) nodes before phase 1 queries run out:

  1. A Coordinator node is sending a “Rollback” request to all Cohort nodes.
  2. The participant (Cohort) node receives “Rollback” requests, performs Rollback using the previously written Undo information, and releases the resources occupied during the entire transaction.
  3. A Cohort node sends an “Ack” message to a Coordinator node for rollback completion.
  4. A Coordinator node cancells the transaction after receiving an “Ack” message from all Cohort nodes.

disadvantages

  1. Synchronization blocking: All participants (Cohort) nodes are transaction blocking during execution. Cohort will not be able to carry out other tasks while waiting for a response from other participants;

  2. Single point of failure: If a Coordinator fails, the Cohort will continue to be blocked due to the importance of a Coordinator. In particular, if a Coordinator fails in Phase 2, all Cohort participants are still ina state of locked transaction resources and cannot continue to complete transaction operations.

  3. Inconsistent data: In phase 2, after a Coordinator sends commit requests to Cohort, a local network exception occurs or a Coordinator has a problem during the process of sending commit requests. For example, only a part of the participants (Cohort) will receive commit requests, so that they will be committed after receiving commit requests, while other participants (Cohort) who have not received commit requests will not be able to commit transactions. Then the whole distributed system appears the data consistency phenomenon.

  4. No perfect fault tolerance: the failure of any node leads to the failure of the entire transaction. If a participant (Cohort) is down or timed out, a Coordinator’s own mechanism needs to determine whether a Coordinator is down after sending a COMMIT message, and the only participant (Cohort) receiving the message is also down. Even if a Coordinator creates a new Coordinator through an election agreement, the status of the transaction is uncertain. No one knows whether the transaction has been submitted.

2.3PC (three-Phase Commit)

3PC (three-phase Commit) The three-phase Commit protocol is proposed to improve the synchronization blocking, single point of problem, split brain, and conservative fault tolerance mechanism defects in 2PC. It is divided into CanCommit, PreCommit, and doCommit three-phase transaction processing protocols. The diagram below:

Phase one: CanCommit

  1. A Coordinator node asks all Cohort nodes whether they can submit information and waits for a response from each node.
  2. A Coordinator sends a commit request to a Cohort. If a commit request is available, a Yes response will be returned to the standby state, or a No response will be returned to the standby state.


Phase 2: PreCommit

A Coordinator will decide whether to continue PreCommit transactions based on the response of Cohort participants. Depending on the response, there are two possibilities for performing commit and interrupt transactions:

Perform commit:

A Coordinator receives Yes responses from all Cohort participants, and a pre-execution of transactions will take place:

  1. Sending pre-commit requests: A Coordinator is sending PreCommit requests to Cohort participants and entering the Prepared phase.
  2. Transaction pre-commit: After receiving PreCommit requests, participants (Cohort) will perform transaction operations and record undo and redo information in the transaction log.
  3. Response feedback: If participants (Cohort) have successfully performed transactions, an ACK response will be returned and they will wait for the final instruction.

Interrupt transaction:

If a Coordinator has received No response from a single Cohort to a Coordinator, or if a Coordinator has not received a response from a Cohort after waiting for a timeout, the transaction will be interrupted:

  1. Sending interrupt requests: Coordinators send abort requests to all participants (Cohort).
  2. Interrupt transactions: Interruption of transactions conducted by participants (Cohort) after receiving abort requests from coordinators (or after a timeout, no requests from participants (Cohort) have been received).


Phase 3: DoCommit

The real transaction commit occurs in this phase, which can also be divided into two cases of commit and interrupt transactions:

commit

  1. Sending a request for submission: If a Coordinator receives ACK responses from a Cohort T, he or she will change the pre-submission state to the submission state. And send doCommit requests to all participants (Cohort).
  2. Transaction Commit: Formal transaction commit for Cohort participants after receiving doCommit requests. All transaction resources are released after the transaction commits.
  3. Response feedback: After the transaction is committed, an ACK response is sent to the Coordinator.
  4. Complete transactions: A Coordinator will complete transactions after receiving ACK responses from all Cohort participants.

Interrupt transactions: If a Coordinator is normal and has received No response from participants (Cohort), or has not received AN Ack response (it may be that the recipient is not sending an Ack response, or the response may be timed out), the interrupt transaction will be implemented.

  1. Sending interrupt requests: The Coordinator sends abort requests to all participant nodes.
  2. Transaction rollback: After receiving abort requests, participants (Cohort) use the recorded Undo information to roll back transactions, and release resources occupied during the whole transaction execution after rollback.
  3. Feedback Transaction rollback result: Cohort participants send Ack messages to coordinators after completing transaction rollback.
  4. Interrupt transactions: A Coordinator will interrupt transactions after receiving Ack messages from all Cohort participants.

The advantages and disadvantages

  • Advantages: Reduced blocking range for participants and the ability to continue to reach agreement after a single point of failure
  • Disadvantages: preCommit phase is introduced, during which the coordinator cannot communicate with the participants properly if there is a network partition, the participants will still commit transactions, resulting in data inconsistency.

3. To summarize

Both two-phase commit and three-phase commit solve the problem of distributed data consistency to varying degrees and are widely used, but neither of them can completely solve the problem of distributed data consistency. There is also a consistency protocol Paxos algorithm, to solve the problem of indefinite waiting, but also to solve the “split brain” problem, easy to understand the principle of Paxos can be seen in the article “Easy to understand Paxos algorithm – Consistency algorithm based on message passing”.

Is everyone still ok? If you like, move your hands to show 💗, point a concern!! Thanks for your support!

Welcome to pay attention to the public number [Ccww technology blog], original technical articles launched at the first time