Introduce you to data consistency issues in distributed systems

Liu is a second-year graduate student who is about to find a job. On the one hand, he wrote his blog to review and summarize the knowledge points of big data development, and on the other hand, he hopes to help his fellow students who learn programming by themselves. Because Old Liu is self-taught big data development, there will certainly be some deficiencies in the blog, but also hope that everyone can criticize and correct, let us progress together!

Today, I will talk to you about the data consistency of distributed system. This must start from the development of server architecture deployment. Article length is longer, we watch patiently, wonderful do not miss!

1. The background

1.1. Centralized services

The first thing I want to talk about is centralized services. What is centralized? It’s all done by one server.

The centralized system is composed of one or more main computers in the central node, data centrally stored in this central node, and all the business of the whole system are in this central node, all the functions of the system are done by it.

In other words, in a centralized system, each client is only responsible for data input and output, and data storage and control processing is completely handed over to the host.

The advantages of centralized service:

Simple structure
The deployment of simple
Simple project architecture

But its disadvantages are also obvious:

Mainframes are very expensive to develop and maintain
Mainframes are very expensive
A single point of failure occurs. When a host is suspended, all services are terminated
The performance scaling of mainframes is limited by Moore’s Law

What is Moore’s Law?

Moore’s Law was developed by Gordon Moore, one of the founders of Intel. At constant prices, the number of components that can be accommodated on an integrated circuit doubles roughly every 18-24 months, and so does its performance. In other words, the amount of computer power a dollar buys will more than double every 18-24 months. From: Baidu Baike

Moore’s Law tells us: vertical expansion is theoretically limited, so only horizontal expansion can be considered, and in theory, horizontal expansion is theoretically not limited!

Since vertical expansion is limited, we try horizontal expansion, and there is distributed!

1.2. Distributed services

Distributed means that more ordinary computers (as opposed to expensive mainframes) can be used to provide services in distributed clusters. The more computers there are, the more CPU, memory, storage resources, etc., and the more concurrent visits they can handle.

For example, an electronic mall implemented by a distributed system may be divided into multiple applications to provide different functions respectively and form a distributed system to provide services externally.

As a result, the computers in a distributed system are almost unlimited in space. They may be placed in different cabinets, deployed in different rooms, or located in different cities.

Compared with the centralized system, the distributed system has higher cost performance, stronger processing capacity, higher reliability, and good scalability.

However, distribution solves the problem of high concurrency of web sites, but it also brings other problems.

First, the necessary condition for distribution is network, which may have some impact on performance and even service capability. Second, the greater the number of servers in a cluster, the greater the probability of server outages. In addition, since the service is distributed in a cluster, the user’s request will only land on one of the machines, so data consistency issues can easily arise if not handled properly.

1.3. Distributed exceptions exist

1. Communication exception: the network is unavailable (message delay or loss), which will lead to the failure of network communication within the distributed system. Therefore, data loss and inconsistent status of multiple nodes may be caused, and data may be out of order.

2, network partition: the network is not connected, but the internal network of each sub-network is normal, resulting in the network environment of the whole system is cut into a number of isolated areas, distributed system will appear local small cluster caused by data inconsistency.

3. Node fault: A server node breaks down.

4, storage data loss: For stateful nodes, data loss means state loss, usually can only be read from other nodes, restore the storage state. Solution: Utilize the multi-copy mechanism.

1.4. Measuring performance indicators of distributed systems

1, performance: this is a very troublesome problem, the pursuit of high throughput system, often difficult to achieve low latency; It is also difficult to improve QPS when the average response time of the system is long.

The throughput capacity of a system refers to the amount of data that a system can process at a given time, usually measured by the total amount of data processed per second. System response delay refers to the time required by the system to complete a certain function; The concurrent capability of a system refers to the ability of the system to perform a function at the same time, and is usually measured by QPS.Copy the code

2. Availability: System availability refers to the ability of the system to provide services correctly in the face of various exceptions. Availability is an important index of distribution, which measures the robustness of the system and reflects the fault tolerance of the system.

3. Scalability: The scalability of a distributed system can improve system performance (scalability, latency, concurrency), storage capacity, and computing capability by expanding the cluster machine scale.

4. Consistency: In order to improve availability, distributed systems inevitably use the mechanism of duplicates, which leads to the problem of duplicates consistency.

For example, if a piece of data exists in a distributed system, the same data exists in multiple different nodes. If different nodes store different data, this situation will exist when multiple clients access, the first client access result is A, the second client access result is B, two clients access different results, that is, the consistency is not good.

Having said that, if we design a good distributed system, it should have these characteristics: high throughput, low response latency, high concurrency, high availability, high scalability, and good consistency. But not every feature can be satisfied, there are several characteristics are contradictory, we need to think of ways to overcome!

The real complication in distributed scenarios is data consistency!

1.5. Consistent understanding

There are many kinds of consistency. Here are three that Lao Liu knows.

Strong consistency: After the write operation is complete, the read operation can read the latest data. Generally speaking, as long as the client writes the results in, it can get the latest data whenever it visits. However, it is difficult to implement in distributed scenarios, and subsequent Paxos algorithm, Quorum mechanism, ZAB protocol and so on can be implemented!

Weak consistency: the latest data is not guaranteed, and old data may be available.

Final consistency: Not considering any intermediate states, only to ensure that the final data in the system is correct after a period of time. It is also the most widely used consistency model in high concurrency scenarios.

1.6. The role of distributed consistency

All that said about distributed consistency, what does it do?

1. In order to improve the availability of the system, the multi-copy mechanism is generally used, which leads to the problem of distributed consistency. It is to improve the availability of the system and prevent system unavailability caused by a single point node failure.

2, improve the overall performance of the system, data distribution in the cluster on multiple nodes, they can provide services for users.

Old Liu said so much, we have to guess to elicit what content?

All of this just leads to the problem of data consistency in distributed systems! The solutions we use to solve the data consistency problem of distributed system are as follows:

Distributed transaction + transaction distributed consistency algorithm Quorum mechanism CAP and BASE theoryCopy the code

2. Distributed transactions

In a distributed system, each node can know whether its transaction is successful or not, but there is no way to know whether other nodes in the system are successful or not. This can result in inconsistent states of nodes in a distributed system. So when a transaction needs to span server nodes and the ACID nature of the transaction is guaranteed, a coordinator role must be introduced. All other nodes that perform transactions are called participants.

In real life, there are two typical commit modes for distributed transactions: 2PC and 3PC.

2.1. 2PC Submission process

Directly above:

I ask A to do one thing and B to do another, and both are guaranteed to succeed or fail simultaneously in A distributed transaction. So how do you get consistent data?

2PC has two phases: Phase 1: Executes the transaction, but does not commit. Phase 2: When the coordinator receives positive feedback from all transaction participants in phase 1 (the transaction was successfully executed), he issues a command for all participants to commit the transaction.Copy the code

2.2. 2PC problems

Looking at 2PC’s two commit phases and diagrams, experienced people will see the problem at a glance.

1 Blocking problem

The coordinator sends the command to the participant, and since the command is sent through the network, there will be different participants who receive the command in sequence and delay. For example, player A received the command quickly, while Player B had network problems and received the command A long time later. Participant A processed and sent the feedback quickly, while participant B sent the feedback after A long time, resulting in A particularly long waiting time for the coordinator. This is a typical blocking problem that wastes resources and affects performance!Copy the code

2 There is no fault tolerance mechanism, there is a single point of failure

Once the transaction coordinator is at the heart of a distributed transaction, if you look at the diagram above, you can see that the participant does not receive commit/ ROLLBACK notifications, leaving the participant node in an intermediate state where the transaction cannot be completed.Copy the code

3 Data is inconsistent

In the second phase, if a local network problem occurs and one participant receives a submitted command and another participant does not receive a submitted command, data inconsistencies between nodes will result.Copy the code

2.3. 3 PCS

3PC is an improved version of phase 2 commit, which splits the “commit transaction request” of phase 2 commit protocol into three phases: CANCOMMIT, PreCOMMIT, and DOCOMMIT.

In addition to the addition of the CanCommit phase to 2PC, a timeout mechanism was introduced. If a transaction participant does not receive a COMMIT/ROLLBACK instruction from the coordinator within a specified time, a local commit is automatically performed, which resolves the coordinator’s single point of failure.

2.4. Perform process parsing

Phase 1: CanCommit phase

During the preparation of the first stage, each participant is asked whether they can conduct transaction operations and the timeout mechanism. Participants will automatically submit if they do not receive instructions from the coordinator within a certain period of time.Copy the code

Phase 2: PreCommit phase

1. If each participant returns a consent, the coordinator will send a pre-submission request to all participants and enter the pre-submission stage; 2. After receiving the pre-commit request, the participant performs the transaction operation. 3. After executing the local transaction, the participant sends an Ack to the coordinator indicating that it is ready to commit and waits for the coordinator's next instruction. 4. If the coordinator receives the advance symphony should be rejected or timed out, he will perform the interrupt transaction operation and notify each participant to interrupt the transaction. 5. Participants will take the initiative to interrupt the transaction/directly submit after receiving the interrupted transaction or waiting for timeoutCopy the code

Stage 3: doCommit stage

1. After receiving all participating ACKS, the coordinator will move from pre-submission to submission and send submission requests to each participant. 2. The participant receives the commit request, formally commits the transaction (COMMIT), and reports the commit result Y/N to the coordinator. 3. The coordinator receives all feedback messages and completes the distributed transaction. 4. If the coordinator does not receive feedback due to timeout, the interrupt transaction instruction is sent. 5. After receiving the interrupt transaction command, participants use the transaction log to rollback. 6. The participant feedback the rollback result, and the coordinator receives the feedback result or times out to complete the interrupted transaction.Copy the code

2.5. 3PC problems

3PC May also have inconsistent data. In the third phase, all participants roll back the transaction, but one participant does not receive the transaction within the specified time, it will commit by default, and data inconsistency will occur. Data inconsistencies are particularly common between phases 2 and 3 due to network problems.

3. Distributed consistency algorithm

On the principle of 2PC and 3PC, excellent developers have implemented distributed consistency algorithm. Here, Mr. Liu will roughly talk about Poxos algorithm and ZAB protocol related concepts. If you want to understand the Paxos algorithm and ZAB protocol in detail, such as Old Liu after looking for work, specifically to write a Zookeeper source code article.

3.1. Paxos algorithm

Paxos algorithm uses a Greek story to describe, in Paxos, there are three kinds of roles, respectively

A Proposer(a Proposer), an Acceptor(a person who accepts or rejects a proposal), and a Learner(a proposal that is selected and approved if more than half of acceptors accept it).Copy the code

Mapping to the ZooKeeper cluster:

The leader initiates the proposal the president (the solution to the single point of failure is the leader election mechanism) follower participates in the voting the NPC deputies passively accept all the people in the countryCopy the code

And one particularly famous institution: the parliamentary system

A majority agreement is guaranteedCopy the code

To summarize the Paxos algorithm, it means that all transaction requests must be coordinated and processed by a globally unique server, which is called the leader server and the remaining servers become the follower server.

The Leader server converts a client transaction request into a transaction proposal and distributes the proposal to all follower servers in the cluster. The leader then waits for feedback from all the follower servers. Once more than half of the follower servers have responded correctly, the leader sends a commit message to all the follower servers. Ask them to submit the previous proposal.

3.2. ZAB agreement

The underlying working mechanism of ZooKeeper is realized by ZAB. It realizes two main functions of crash reply and message broadcast.

Two important features of ZAB protocol to ensure data consistency are:

1. The ZAB protocol needs to ensure that transactions that have been committed on the Leader server are eventually committed by all servers.

2. The ZAB protocol needs to ensure that transactions that are only proposed on the Leader server are discarded.

To solve the single point of failure, there is the leader election algorithm. In the leader election, if the leader election algorithm can ensure that the newly elected leader server has the transaction proposal with the highest transaction number (ZXID) of all machines in the cluster, then it can be guaranteed that the newly elected leader must have all the submitted proposals.

Because each execution of a transaction is assigned a number, the highest transaction number represents the most recent transaction, the most recent data. ZooKeeper implements distributed system data consistency according to the above ZAB protocol content!

4. Pigeon nest principle

Simple description: If there are n cages and n+1 pigeons, and all the pigeons are housed in pigeonholes, then at least one of the cages has at least 2 pigeons.

Quorum NWR mechanism

Quorum NWR: The Quorum mechanism is commonly used in distributed scenarios to ensure data security and implement final consistency voting algorithms in distributed environments. The main principle of this algorithm comes from the pigeon nest principle. Its biggest advantage is not only to achieve strong consistency, but also to customize the consistency level!

N: total number of nodes

W: indicates the total number of write successes

R: total number of reads

When W+R>N, it is guaranteed to read the latest data, i.e., strong consistency! Why do you say that?

As shown in the picture above, there are four boxes, and three boxes have things in them. How can we ensure that we can get the boxes with data? At least take 2 boxes to get the box with the stuff!

Is the use of this principle, as long as ensure (W + R > N) will be able to read the latest data, data consistency level can be based on the number of read and write copy constraints to achieve strong consistency!

That is now divided into the following three cases to discuss: the premise is that N has been determined not to change!

W = 1, R = N, Write Once Read All

In a distributed environment, if you write a copy, you only have something in one bin, so if you want to read the latest data, that is, get the bin that has something, you have to read all the nodes and get the latest version of the value. The write operation is efficient, but the read operation is inefficient. High consistency, but poor fault tolerance and availability of partitions.

W = N, R = 1, Read Only Write All

In the distributed environment, data can be read only after all nodes are synchronized, so the latest data can be read as long as any node is read. The read operation is efficient but the write operation is inefficient. Partition fault tolerance, poor consistency, more difficult to implement, high availability.

W = Q, R = Q where Q = N/2 + 1

If more than half of the nodes are written, more than half of the nodes are read, achieving balanced read/write performance. For general applications, the read and write performance is balanced. Such as N=3, W=2, R=2, partition fault tolerance, availability, consistency to achieve a balance.

That’s what ZooKeeper does! The third case is used!

6. Theory of the CAP

According to the above, achieve strong consistency, it is difficult to achieve high availability, the two are very contradictory. Therefore, CAP theory tells us that A distributed system cannot satisfy C, A and P requirements at the same time.

C: Consistency

Multiple copies of data are consistent in a distributed environment

A: Availability

The services provided by the system must always be available and always return results within a limited time for each operation requested by the user

P: Partiton Tolerance Fault Tolerance of a partition

When a distributed system encounters any network partition failure, it still needs to be able to provide consistent and available services

Since A distributed system cannot meet the requirements of C, A and P at the same time, how to choose?

CAP can only choose 2 of 3, because in distributed systems, fault tolerance P is absolutely necessary, so there are only two cases where network problems result in either error return or blocking wait, sacrificing consistency or availability.

For stand-alone software, because P is not considered, so it must be CA type, such as MySQL.

Distributed software, such as HBase and Redis, must be balanced between A and C because P must be considered and both A and C cannot be considered. Make sure the service is basically available and the data is ultimately consistent. So, there’s the BASE theory.

7. The BASE theory

In most cases, we do not necessarily require strong consistency, and some businesses can tolerate delayed consistency to a certain extent. Therefore, in order to give consideration to efficiency, we develop the final consistency theory BASE, whose core idea is: Even if strong consistency cannot be achieved, each application can adopt appropriate methods to achieve final consistency according to its own service characteristics.

In A word, do not go to extremes. BASE is the result of balancing C and A in CAP theory.

BASE theory does not achieve strong consistency, but final consistency; Not high availability, but basic availability.

Basically Available: A distributed system is Basically Available in the event of a failure. For example: Taobao Double 11, in order to protect system stability, normal order, other edge services can be temporarily unavailable.

Eventually Consistent: Eventually Consistent is when all copies of data in a system reach a Consistent state after a certain amount of time.

In the future, when you develop distributed systems, you can decide whether to pursue high availability or consistency depending on your business!

8. To summarize

Well, distributed system data consistency problem roughly about the same, Liu mainly told you about the distributed system consistency background and implementation. Although his current level may not be as good as yours, Liu still hopes to become better and help more self-taught programming partners.

If there are related problems, contact the public number: hard old Liu, and old Liu for a happy exchange, if you feel that helped you, might as well like attention to support a wave!