In distributed system, CAP theory was put forward by Professor Eric Brewer of University of California, Berkeley, which stated that it is impossible to satisfy Consistency, Availability at the same time in a distributed system. And Partition tolerance.

  • C: Consistency

In distributed systems, there are often multiple copies of data. Consistency describes the consistency of data in these copies in terms of content and organization.

  • A: Availability

Availability describes the ability of the system to serve the user. Availability means to return the desired results within the time range tolerated by the user.

  • P: fault tolerance of partitions

Distributed system usually consists of multiple nodes, because the network is not reliable, because there is distributed in the cluster node network communication failure is isolated into the possibility of a small cluster, the network partition, partition fault tolerance requirements in the event of a network partition system can still provide consistency of services available.

For a distributed system, we always assume that the network is not reliable, so that fault tolerance is of a distributed system is the most basic requirements, our entry point more is trying to find a balance between availability and consistency, but it is not required when we are in the system design has been built in network partition scenario, Then it’s an either/or choice between consistency and availability. In fact, Eric Brewer pointed out in 2012 that the idea that CAP theory proves that consistency, availability, and fault tolerance of partitions cannot be met at the same time is somewhat misleading in practical system design guidance. Traditional understanding of the theory of CAP that when designing A distributed system must satisfy the P, then the trade-off between C and A, this is one-sided, in the actual network the possibility that partition is relatively small, especially in the network environment is becoming more and more good, and even many systems have special line support, so does not appear in the network partition, Or should you consider both A and C; In addition, there should also be an evaluation range for consistency, availability, and fault tolerance of partition. The simplest one is availability. Only when the response timeout occurs in the proportion of requests can it be considered as unavailability, rather than unavailability when timeout occurs. Finally, we need to consider is A distributed system is generally A relatively large and complex system, we should from the smaller particle size to evaluate each subsystem and design, rather than simply as A whole think need to meet the P, and trade-offs between A and C, some subsystems may need to be at the same time satisfy the three as much as possible.

Getting distributed clusters to always provide consistent services that are available externally has always been a challenging and interesting task. Put aside availability, consistency, for one, for a relational database we usually use transaction to ensure data consistency, when we the amount of data is more and more big, big to a single library has been unable to bear, we have to take the depots table strategy for database implementation level split, build distributed database cluster, In this way, the pressure of a database can be distributed to multiple databases, which greatly improves the storage and response capacity of the database, but after splitting it, it also brings many limitations for us to use the database, such as global uniqueness of primary keys, joint table query, data aggregation and so on. Another tricky problem is that database transactions have changed from single-library transactions to distributed transactions.

It is not difficult to implement distributed transactions, such as the two-phase Commit (2PC: two-Phrase Commit) and three-phase Commit (3PC: Three-phrase Commit (three-Phrase Commit) provides some ideas, but it is almost impossible (at least for now) to ensure strong data consistency and provide services that are available to the outside world, so many distributed systems avoid strong data consistency.

Two-phase Commit protocol (2PC: two-Phrase Commit)

The goal of the two-phase commit protocol is to ensure data consistency in distributed systems. Many distributed systems use this protocol to support distributed transactions (provided but not necessarily used). As the name implies, the protocol splits a distributed transaction process into two phases: the vote phase and the transaction commit phase. In order to make the entire database cluster can normal run, the agreement specifies a single point “coordinator”, used to coordinate the operation of the entire database cluster, in order to simplify the description, we will be the inside of the database each node called “participants”, also contain three commit protocol “coordinator” and “participant” these two definitions.

Stage 1: Voting stage

The main purpose of this stage is to explore whether each participant in the database cluster can perform transactions normally. The specific steps are as follows:

  1. The coordinator sends transaction execution requests to all participants and waits for feedback on transaction execution results.
  2. Upon receipt of the request, the transaction participant executes, but does not commit, and logs the transaction.
  3. Participants feedback their transaction execution status to the coordinator and block waiting for the coordinator’s subsequent instructions.
Phase 2: Transaction commit phase

After the phase 1 coordinator’s query, each participant responds to the execution of his transaction, where three possibilities exist:

  1. All participants return to normal transaction execution
  2. One or more participants failed to reply to the transaction
  3. The coordinator waits out of time.

In the first case, the coordinator notifies all participants to commit the transaction as follows:

  1. The coordinator sends a COMMIT notification to each participant, requesting that the transaction be committed.
  2. After receiving the transaction commit notification, the participant performs the COMMIT operation and then releases the occupied resources.
  3. The participant returns the transaction COMMIT result information to the coordinator.

For the second and third cases, the coordinator considers that the participant cannot successfully execute the transaction normally. For the sake of data consistency of the whole cluster, the transaction rollback notification should be sent to each participant. The specific steps are as follows:

  1. The coordinator sends a transaction rollback notification to each participant requesting that the transaction be rolled back.
  2. After receiving the transaction rollback notification, the participant performs the ROLLBACK operation and releases the occupied resources.
  3. The participant returns transaction ROLLBACK result information to the coordinator.

Two-phase commit protocol solves the problem of strong consistency of distributed database data. Its principle is simple and easy to implement, but its disadvantages are also obvious, including the following:

  • A single point of the problem

Coordinator in the whole two-phase commit plays an important role in the process, once the coordinator server goes down, then it will affect the normal operation of the entire database cluster, in the second phase, for example, if a coordinator for fault cannot send normal transaction commit or rollback notice, the participants would have been in the blocking state, The entire database cluster will be unavailable.

  • A synchronized block

In the two-phase commit execution process, all participants need to follow the unified scheduling of the coordinator, and they are blocked and cannot engage in other operations, which is extremely inefficient.

  • Data inconsistency

Although the two-phase commit protocol is designed for strong consistency of distributed data, there is still the possibility of data inconsistency. For example, in the second phase, suppose that the coordinator sends the notice of transaction COMMIT, but only part of the participants receive the notice and execute the commit operation due to network problems. The rest of the participants are blocked because they have not been notified, resulting in data inconsistencies.

Three-phase Commit protocol (2PC: three-Phrase Commit)

To solve the problem of two-phase commit, the three-phase commit protocol introduces a “pre-query” phase and a timeout policy to reduce the blocking time of the whole cluster and improve system performance. The three phases of the three-phase commit are can_COMMIT, pre_COMMIT, and DO_COMMIT.

Phase 1: CAN_COMMIT

At this stage, the coordinator will ask each participant whether they can normally execute the transaction, and the participant will reply an estimated value according to their own situation. Compared with the actual execution of the transaction, this process is lightweight, and the specific steps are as follows:

  1. The coordinator sends transaction query notifications to each participant, asking if the transaction can be performed, and waits for a response
  2. Each participant returns an estimated value according to their own situation. If they estimate that they can perform the transaction normally, they will return a positive message and enter the preparatory state, otherwise they will return a negative message
Phase 2: pre_COMMIT

In this stage, the coordinator will take corresponding actions according to the results of the inquiry in the first stage. There are three main inquiry results:

  1. All participants return confirmation information
  2. One or more participants returns a negative message
  3. The coordinator waits out of time

In the first case, the coordinator sends a transaction execution request to all participants as follows:

  1. The coordinator sends transaction execution notifications to all transaction participants
  2. When notified, the participant performs the transaction but does not commit
  3. The participant returns the transaction execution to the client

In the above step, the transaction is interrupted if the participant waits out of time. In the second and third cases, the coordinator believes that the transaction cannot execute properly and issues abort notices to each participant requesting to exit the prepared state as follows:

  1. The coordinator sends abort notifications to all transaction participants
  2. When the participant is notified, the transaction is interrupted

Stage 3: DO_COMMIT

If the second phase transaction is not interrupted, then the coordinator of this phase will decide to commit or roll back the transaction based on the result returned by the transaction execution, which can be divided into three situations:

  1. All participants can perform transactions normally
  2. One or more participants failed to execute a transaction
  3. The coordinator waits out of time

In the first case, the coordinator initiates a transaction commit request to each participant as follows:

  1. The coordinator sends a transaction commit notification to all participants
  2. After receiving the notification, all participants perform the COMMIT operation and release the occupied resources
  3. Participants report transaction commit results to the coordinator

In the second and third cases, the coordinator considers that the transaction cannot execute properly and sends a transaction rollback request to each participant as follows:

  1. The coordinator sends transaction ROLLBACK notifications to all participants
  2. After being notified, all participants perform rollback and release the occupied resources
  3. Participants report transaction commit results to the coordinator

In this phase, if the participant is unable to receive commit or ROLLBACK requests from the coordinator due to coordinator or network problems, the participant will not be blocked as in the two-phase commit, but will wait for timeout and continue to commit. Compared with two-phase commit, synchronization blocking is reduced, but data inconsistency is still unavoidable.

In the distributed database, if expecting to reach the strong consistency of the data, then the basic no availability services, which is why many distributed database provides the cross-database transaction, but it is just a decoration, we more pursuit is data in the practical application of weak consistency or eventual consistency, discarding the availability for strong consistency is not desirable.

Source: my.oschina.net/wangzhencha… Welcome to pay attention to the public number [code farming blossom] learn to grow together I will always share Java dry goods, will also share free learning materials courses and interview treasure book reply: [computer] [design mode] [interview] have surprise oh