First shot in the distributed series: Distributed consistency!

preface

Articles will be regularly synchronized to the personal website!

If you feel you have gained something, I hope you can give me a thumbs up and support me. Thank you, thank you!

The next post:

  • Distributed algorithms in the distributed family

background

In the Internet era and environment, in order to quickly respond to demand and improve system throughput, micro-service transformation is often carried out to separate complex systems and data.

At this time, consistency refers to the weak consistency among distributed servitization systems, including application system consistency and data consistency.

Consistency level instructions
Strong consistency What is required to be written by the system, and what will be read out? Good user experience, but implementation often has a big impact on system performance
Weak consistency After data is written successfully, the system does not guarantee that the data can be read immediately or that the data will be consistent after a certain time level (for example, second level) is reached
Final consistency Final consistency is a special case of weak consistency. The system ensures that data consistency can be achieved within a certain period of time. Final consistency is a highly respected consistency model in weak consistency, and it is also a highly respected data consistency model in large distributed systems

Examples of consistency in life:

When a bank processes a transfer, it deducts the balance from your account and adds to someone else’s.

If you succeed in reducing your account balance and failing to increase someone else’s, you will lose the funds.

Conversely, if you fail to reduce your account balance and you succeed in increasing someone else’s, the bank loses the money and has to pay.

The following is to learn about distributed consistency through the introduction of theory and practical schemes.

The basic theory

ACID

Database management systems (DBMSS) provide for their transactions to be correct and reliable when writing or updating data: Atomicity, consistency, Isolation, and persistence.

In database systems, a transaction is a complete logical process consisting of a series of database operations.

  • For example, the sum of the two database operations, such as bank transfer, deducting the amount from the original account, and adding the amount to the target account, constitutes a complete logical process and is not separable.

This process is called a transaction and has ACID properties

The ACID properties instructions
Atomicity Atomicity means that all operations contained in a transaction either all succeed or all fail and roll back
Consistency Transactions must move the database from one consistent state to another
Isolation When multiple users concurrently access the database, for example, when operating on the same table, the transactions opened by the database for each user cannot be disturbed by the operations of other transactions, and multiple concurrent transactions must be isolated from each other
“Durability” Once a transaction is committed, the changes made to the data in the database are permanent, even if the database system encounters a failure, the operation of committing the transaction is not lost

Theory of CAP

Consistency

In a distributed environment, consistency is the property of data consistency across multiple replicas.

Under the requirement of consistency, when a system performs an update operation in a consistent state, the data of the system should remain in a consistent state

Availability

Availability means that the services provided by the system must always be available, and the results of each operation request can always be returned within a limited period of time.

  • “Finite time” means that the system must be able to respond to a user’s operation request within a specified period of time, beyond which the system is considered unavailable

  • “Return result” is another very important metric of availability, requiring the system to return a normal response after processing a user request

Partition Tolerance

When a distributed system encounters any network partition failure, it still needs to be able to guarantee the consistency and availability of external services, unless the entire network environment fails

The actual situation

CAP theory proves that any distributed system can only satisfy two points at once, not all three

choice Said Ming
CA Abandon partition fault tolerance, enhance consistency and availability, in fact, is the traditional stand-alone database choice
AP Abandoning consistency (and by consistency I mean strong consistency) in favor of partition fault tolerance and availability is the design choice of many distributed systems, such as many NoSQL systems
CP Abandon availability, pursue consistency and partition fault tolerance, basically do not choose, network problems will directly make the entire system unavailable

If C and A satisfy, can P satisfy?

The data of all servers should be the same to meet the requirement of C, that is to say, to realize data synchronization, then does the synchronization need time? Definitely yes, and the more machines there are, the slower the synchronization time will be, so here’s the problem, we also satisfy A, which means I need to keep the synchronization time short. So you can’t have too many machines, which means you can’t have enough P

Satisfies C and P, but can A satisfy?

Meet P need many servers, suppose you have 1000 servers, at the same time satisfy the C, that is to ensure that each machine data are the same, so the time synchronization but is very big, in this case, we are certainly not guarantee user access each server access to data is the latest, want to get the latest, you’ll have to wait, wait all finished synchronization, You can get it, but our requirement of A is that we can get the data we want in A short time. This is not A contradiction, so A can’t meet it

For distributed system, network problem is an inevitable exception, so fault tolerance of partition has become a distributed system must face and solve the problem.

Therefore, the effort is often spent on finding A balance between C (consistency) and A (availability) based on business characteristics

The BASE theory of

BASE is an acronym for Basically Available, Soft state, and Eventually consistent.

BASE theory is the result of the tradeoff between consistency and availability in CAP. It comes from the summary of distributed practice of large-scale Internet system and is gradually evolved based on CAP theorem.

The core idea of BASE theory is as follows:

  • Even if strong consistency cannot be achieved, each application can adopt appropriate methods to achieve final consistency according to its own service characteristics.

Let’s look at the three elements of BASE:

Basically Available

Basic availability means that a distributed system allows for a partial loss of availability in the event of an unpredictable failure (note that this is by no means equivalent to the system being unavailable).

Such as:

  • Loss in response time. Normally, an online search engine returns the search results to users within 0.5 seconds. However, the response time of the search results increases by 1 to 2 seconds due to the fault

  • Losses on the system functions: under normal circumstances, when an e-commerce site for shopping, consumers can almost complete each order, but in some big promoting shopping boom, due to a surge in consumer shopping behavior, in order to protect the stability of the shopping system, some consumers may be lead to a downgrade page

Soft State

Soft state refers to allowing intermediate state of data in the system and considering that the existence of such intermediate state does not affect the overall availability of the system, that is, allowing delay in the process of data synchronization between data copies of different nodes

Eventually Consistent

Final consistency emphasizes that all copies of data, after a period of synchronization, eventually reach a consistent state.

Therefore, the essence of final consistency is that the system needs to ensure the consistency of the final data, rather than ensuring the strong consistency of the system data in real time.

In general, BASE theory is oriented towards large, highly available and scalable distributed systems, which is the opposite of the traditional transactional ACID property. It is completely different from ACID’s strong consistency model, but gains availability by sacrificing strong consistency, and allows data to be inconsistent for a period of time, but eventually reach a consistent state.

However, in actual distributed scenarios, different services and components have different requirements for data consistency. Therefore, ACID characteristics and BASE theory are often combined in the design process of distributed system architecture.

Distributed Consistency protocol

Two-phase Commit Protocol (2PC)

Phase I (Voting phase) :

  1. The coordinator node asks all the participant nodes if they can commit and waits for the response from each participant node
  2. The participant node performs all transactions until the query is initiated and writes Undo and Redo information to the log (note: if successful, each participant has already performed a transaction).
  3. Each participant node responds to the query initiated by the coordinator node, and if the participant node’s transaction actually succeeds, it returns a “agree” message.
  4. If the transaction operation of the participant node actually fails, it returns an abort message

Phase ii (Submission for implementation) :

When the coordinator node gets the corresponding message “agree” from all the participant nodes:

  1. The coordinator node issues a “formally commit” request to all the participant nodes
  2. The participant node formally completes the operation and releases the resources that were held during the entire transaction
  3. The actor node sends a “done” message to the coordinator node
  4. The coordinator node completes the transaction after receiving a “complete” message from all the participant nodes

Existing problems:

The resource is blocked by synchronization

During the data submission process, all participating servers are blocked. If other threads want to access the critical area resources, they need to wait for the session request to be released after local execution completes.

Therefore, the two-phase commit algorithm will also reduce the efficiency of concurrent execution.

A single point of the problem

In addition, single point problems can occur. Single point problem, also known as single point server failure problem, refers to the failure of the scheduling server as a distributed cluster system, the whole cluster cannot carry out the two-phase commit algorithm because of the lack of coordinator.

The single point problem is also the biggest disadvantage of two-phase submission, so the two-phase submission algorithm is usually improved to meet the requirements of system stability.

Data inconsistencies occur during the Commit phase

When servers in the statistical cluster are available for transaction operations, the coordination server sends a COMMIT request to those servers that are processing transactions.

If, during this process, one or more of the servers suffer a network failure and are unable to receive submission requests from the coordinating server, the servers are unable to complete the final data changes, resulting in data inconsistency across the distributed cluster.

Three-phase Commit Protocol (3PC)

In fact, three-stage submission is an optimization and improvement on the basis of two-stage algorithm.

To solve problems such as synchronous blocking in the two-phase protocol, the three-phase commit protocol introduces a timeout mechanism in both the coordinator and the participant, and splits the first phase of the two-phase commit protocol into two steps: ask, then lock the resource, and finally commit.

phase instructions
CanCommit Transaction query, in which the coordinator sends a CanCommit request containing the transaction contents to all participants, asking if the transaction commit operation can be performed and waiting for the response from each participant. After receiving the CanCommit request from the coordinator, the participant will normally give a Yes response and enter the preparatory state if it thinks the transaction can be successfully executed, or a No response if it does not.
PreCommit If during the query phase all participants return that they can perform the action, the coordinator sends a pre-execute request to the participant. The participant then writes redo and undo logs to perform the action, but does not commit the action. If during the query phase any participant returns a result that the action cannot be performed, the coordinator sends an abort request to the participant
DoCommit If each participant returns the preparation success in the preparation stage, that is, the reserved resources and the execution of the operation are successful, the coordinator initiates the submission instruction to the participant, and the participant submits the transaction of the resource change and releases the locked resources. If any participant returns a preparation failure, that is, the resource reservation or operation fails, the coordinator issues an abort command to the participant. The participant cancels the changed transaction and performs undo logging to release the locked resources

Matters needing attention

Once phase 3 is reached, a problem with the coordinator or a failure of the network between the coordinator and the participant occurs, that is, the participant cannot receive a DoCommit or ABORT request from the coordinator in a timely manner. In this case, the participant waits for a timeout and continues to commit the transaction.

Problems with the three-phase commit protocol

After receiving the PreCommit message, if the network is partitioned, the coordinator and some participants cannot communicate with each other normally, and these participants will still commit transactions, resulting in data inconsistency.

TCC

Both 2PC and 3PC have a large-grained resource locking problem.

Let’s imagine a scenario where a user buys 1,000 yuan of goods on an e-commerce website, pays 800 yuan with the balance, and pays 200 yuan with a red envelope.

Let’s look at the flow in 2PC:

Prepare phase:

  • Order system inserts an order record without submission

  • Write redo and undo logs, and do not commit

  • Red envelope system reduces 200 yuan, locks records, writes redo and undo logs, and does not commit

The commit phase:

  • Order system submits order record

  • Balance system commit, release lock

  • Red envelope system submission, release lock

Why is this a large-grained resource locking?

In the prepare phase, after the database reduces the balance of the user by 800 yuan, the record is locked to maintain isolation. Before the transaction is committed, other transactions cannot access the record.

But actually, we only need to reserve $800 of that, we don’t need to lock in the entire user balance. This is a limitation of 2PC and 3PC because they are resource layer protocols and do not provide more flexible resource locking operations.

To solve this problem, TCC came into being. TCC is also essentially a two-phase commit protocol, but unlike the two-phase protocol in JTA, it is a service-level protocol, so developers have the freedom to control the granularity of resource locking depending on the business.

Let’s take a look at the operation of the TCC protocol.

TCC divides the transaction submission process into three stages: try-confirm-Cancel (TCC is short for try, confirm, and Cancel) :

  • Try: Completes the service check and reserves service resources

  • Confirm: Perform service operations using reserved resources (idempotent must be ensured)

  • Cancel: Cancels the service operation and releases reserved resources (idempotent must be ensured).

The process is as follows:

  1. The transaction initiator initiates a transaction request to the transaction coordinator, and the transaction coordinator calls the try method of all transaction participants to complete the resource reservation. At this time, the business is not really executed, but the resources are reserved for the specific business to be executed later, which completes one stage.
  2. If the transaction coordinator finds that the resources reserved by a participant’s try method are insufficient, it will call the participant’s Cancel method to roll back the reserved resources. It should be noted that the cancel method needs to be idempotent, because the call may fail (for example, the participant receives the request due to network reasons, The transaction coordinator did not receive the return receipt due to network reasons.
  3. If the transaction coordinator finds that all participants’ try methods return OK, the transaction coordinator invokes confirm methods of all participants to perform specific business operations without checking resources.
  4. If the coordinator finds that all the participant’s confirm methods are OK, the distributed transaction ends.
  5. If the coordinator finds that some of the participants’ confirm methods failed, or did not receive a return receipt due to network reasons, the coordinator retries. In this case, if the failure persists after a certain number of retries, transaction compensation is common.

Take a look at the flow of TCC in a payment scenario:

Try the operation

  • TryX order system creates orders to be paid

  • TryY frozen account red envelope 200 yuan

  • TryZ has a frozen fund account of $800

Confirm operation

  • ConfirmX order updated as payment was successful

  • ConfirmY deducts 200 yuan from the account red envelope

  • ConfirmZ deducts $800 from the capital account

Cancel the operation

  • CancelX order processing exception, fund red envelope returned, order payment failed

  • CancelY failed to freeze red envelope, account balance returned, order payment failed

  • CancelZ failed to freeze the balance, the account red envelope was returned, and the order payment failed

As you can see, we use a freeze instead of the original account lock (in practice, the freeze can be implemented with database minus operation + log), so that other transactions can use the account balance after the freeze but before the transaction commits, improving concurrency.

To summarize, TCC has the following major differences compared to the two-phase commit protocol:

  • 2PC is in the resource layer and TCC is in the services layer.

  • 2PC interfaces are implemented by third-party vendors, and TCC interfaces are implemented by developers.

  • TCC provides more flexibility in controlling the granularity of resource locking.

  • TCC is very intrusive for applications. Each branch of the service logic needs to implement the try, Confirm, and Cancel operations. Therefore, the application is highly intrusive and costly.

For example, your order service originally had only one interface

// Change the code stateOrderClient. UpdateStatus ();Copy the code

Should be disassembled into three interfaces, namely:

OrderClient. TryUpateStatus (); OrderClient. ConfirmUpateStatus (); OrderClient. CancelUpateStatus ();Copy the code

Currently, there are several implementations of TCC

  • TCC ws-transaction:

  • ByteTCC

  • spring-cloud-rest-tcc

Final consistency pattern

Cache consistency pattern

A common core requirement in high concurrency system is hundred-million-level read requirement. Obviously, relational database is not the best solution to solve high concurrency read requirement. The classic approach of Internet is to use cache

Common cache methods are divided into local cache and distributed cache. If the performance requirements are not very high, the distributed cache is preferred. Local cache can be used for those with low requirements on real-time and distributed data, such as the configuration of some personnel. Normal business flow will not be affected even if the configuration of different machines is not the same for a short time

The database and cache only need to maintain weak consistency, not strong consistency.

Query mode

Each service operation needs to provide a query interface to output the status of the operation execution externally.

Users of service operations can query the interface to know the state of service operation execution, and then perform different processing operations based on the state

Here’s an example:

The RD receives the group message and queries the status of the order to determine whether the system is faulty and needs to be repaired manually

Compensation mode

If the entire operation is in an abnormal state, we need to correct the faulty sub-operation in the operation. This may require re-executing the incomplete sub-operation, which cancels the completed sub-operation and restores the entire distributed system to be consistent. Any effort to make the system finally consistent is called compensation

  1. Automatic recovery: The program automatically achieves consistency by continuing unfinished operations or rolling back completed operations, depending on the environment in which the inconsistency occurred
  2. Notify operations: If the program cannot automatically recover and is designed with inconsistent scenarios in mind, operations can be provided to compensate manually through operations
  3. Notification technology: If, unfortunately, the system does not respond automatically and there is no operational function, it must be solved by technical means, including database changes or code changes, which is the worst scenario

Here’s an example:

After listening to the order being generated, the system will automatically push the message to the library again and generate the entry order for retry. If the system cannot recover automatically, RD access is required to locate and repair the problem

Asynchronous assurance mode

Asynchronous mode is compensation model of a typical case, often applied to consumer demands for response time is not too high, we usually make this kind of operation was removed from the main process, through the way of asynchronous processing, after processing the results inform to use through the notification system, the greatest benefit of this scheme can carry on the peak clipping of high concurrent flow, such as: Logistics and distribution in the e-commerce system, as well as billing and entry in the payment system

In practice, the asynchronous guarantee mode is implemented by encapsulating and persisting asynchronous operations to be executed, and then compensating for unfinished tasks with timed salvage operations. As long as the timed system is robust enough, any task will eventually be executed successfully

Here’s an example:

The procurement system synchronously records logs for budget release and consumption. Asynchronous and scheduled task retries are used to ensure successful release and consumption

Periodic check mode

In the operation of the main process of the system to perform calibration operation, we can asynchronous batch calibration operation state, if found inconsistent operation, then compensation, compensation operation and compensation mode is consistent with the compensation operation

A key to realize periodic calibration is to have a unique ID from beginning to end in distributed system, a common unique ID generation scheme

Here’s an example:

The reconciliation system on the financial side regularly verifies the consistency of settlement data and business document data

Reliable message pattern

For asynchronous operations, message queues can be used to decouple the caller and the called, improve the response speed of the system, and achieve peak elimination.

For message queues, we need to set up special facilities to ensure reliable message delivery and idempotent processor

Reliable delivery of messages

Before sending the message, persist it to the database, mark its status as pending, and then send the message, changing it to successful if it was sent. Scheduled task Periodically retrieves unsent messages from the database and sends the messages

If a third-party message manager is used, a pre-message is sent to the third-party message manager before the message is sent. The message manager persists the message to the database and marks the status as waiting for sending. After the message is successfully sent, the message is marked as successfully sent. Scheduled task Periodically retrieves messages that are not sent within a certain period of time from the database, checks whether the service system continues to send messages, and determines the message status based on the query result

Idempotency of message handlers

To ensure that the message must be sent, there needs to be a retry mechanism. With a retry mechanism, the message must be repeated, so we need to do something about the repetition

  • The unique index of a database table is used to prevent duplicate requests

  • Distributed middleware Redis is used for weight prevention

  • If the state machine is already in the next state, a change in the previous state is theoretically impossible, ensuring the idempotency of the finite state machine

  • Optimistic lock is used to prevent duplication, and data update is conditional. This is also a reasonable choice of optimistic lock in system design. Optimistic lock is done through version or other conditions, so as to ensure timely update in the case of concurrent, there will be no big problems

Here’s an example:

  1. HTTP interface for document preservation, adding a unique ID during front-end submission, anti-resubmission through Redis, to prevent repeated order building
  2. The order interconnects with the inbound and outbound system for MQ asynchronous interaction. Retry is performed through the intermediate state of the order

reference

  • Book: Distributed Services Architecture: Principles, Design, and Practice