This article begins with what is generally considered the most important aspect of distributed systems: data consistency. Content fit experience>=0 years technical related experienceThe crowd.

 

I. Analysis of data consistency problems

 

1
Why a distributed system?

 

Anything can be continuously used and developed, must have its value, distributed system is the same. I think the main purpose of distributed systems is “fast” and “massive”. This “kuai” can be divided into two aspects:

 

  • The system has high processing speed

  • Fast development speed (short duration)

 

The essence of the two points is the same. An action or a thing is divided into two or more parts to be carried out at the same time, so that the overall time is shortened. For example, it would have taken two minutes to do a single task. So I hire two people to do part of it for me, and ideally it can be done in a minute.

 

Of course, the second of these two aspects can be overcome in a sense, but the first is insurmountable. Because no program or computer has infinite performance, if it did, distributed systems would not be as common as they are now (many times the problems that money can solve are not problems).

 

“Mass” is because there are no infinite hard drives, so we need to store data on different hard drives to meet demand. The hard drives may be on different hosts, in different machines, in different regions (and perhaps, in the future, on different planets)).

 

2
Side effects of distributed systems

 

Everything is a combination of contradictions and unity, with two sides. While distributed systems bring the aforementioned benefits, they also bring what is generally considered the biggest problem in the industry — data consistency.

 

Systems are for people to use, and the concepts that constitute usage scenarios are called businesses. Business is the core, for a system, business development is ultimately built on data. I can be slow, I can be down, I can be complicated, and I can live with it. But the only thing you can’t stand is data problems — data errors, data inconsistencies, and so on.

 

Distributed means divide and conquer and cooperate, one person is responsible for one part of a thing.

 

Life is also full of such examples, take holding a Party for example: some people prepare the food, some people prepare the drinks, and some people prepare the setting. Everyone can do all of these things at the same time, but if any of them fail, or don’t fit in with the theme of the Party, it’s a failure. (I don’t know why, but I have a picture of a press conference with cheers and a goblet of erguotou.) .

 

Here’s another example of a program in an e-mart:

 

The four operations here, in terms of their goals, it doesn’t really matter what order they’re in, what matters is that they all succeed or they all fail, and any one of them is inconsistent and you’re going to have a problem. This problem is essentially similar to the problem of human to human communication, the only difference with communication is that the program doesn’t have to respond to all of them. When something is broken up into 100 parts, it’s scary, and from a probability standpoint, the odds of reaching a consensus are 2/5050.

 

The program examples here are not rigorous, because in a real distributed system consistency issues are more complicated than this because there are “read” operations in addition to “write” operations, which will be explained in more detail later.

 

3
Causes of data inconsistency

 

So what causes the data inconsistency?

 

One reason is programming problems (code written incorrectly). This is easy to understand, and it’s easy to think of a solution — do more testing to see if it works as expected. Common unit testing, interface testing, automated testing, integration testing, etc., are all designed to reduce bugs to infinitely close to zero in a cost-effective way, which also makes the position of “test engineer” more useful.

 

However, assuming that there are really no bugs, inconsistencies will still occur. Because software runs on top of hardware, there is also a hardware element. For most of us here,We have toThe hardwarecontrolWeaker than software.

 

Among them, the most serious is the network problem. The network is a much larger and more complex organization than any other, and uncertainty gets worse as the area network and wide area network get bigger. Imagine that each host is just a tiny connection point in a larger network, and the more links it hosts, the more likely it is to have problems.

 

Some friends may have a question, other like hard disk, power failure, what also have the possibility of problems, why the network problem is the most serious?

 

In fact, the hard drive, power supply is like a part of your body, such as hands and feet. The Internet is a communication channel between people, such as mobile phone calls. Although you do not hang up the phone actively, there are many possibilities to interrupt the whole call process, whether it is the subjective will of the other party, the signal is not good, or even intercepted by a third party. I believe we can also recognize that the probability of abnormal phone calls is much higher than their hands and feet.

 

In reality, the characteristics of the network, often encountered problems such as: delay, packet loss, disorder and other problems. In order to solve these problems, many theories and solutions have been developed in the decades since the Internet first appeared in 1969 (when the US military connected four universities through the Internet under an ARPA agreement), which will be reviewed in subsequent articles. In this part, we will first analyze what consistency is.

 

4
Detail consistency

 

What do you mean we reached an agreement? It’s simple enough to say that the same thing is exactly the same at any place at any time.

 

In A football game, for example, the message is the same whether we see the ball passing from player A to player B on the field or in front of the TV. But strictly speaking, this is not really consistent, because the TV needs to receive this information through satellite signals, Internet transmission, etc., we will see it later than the people in the field. Even the person at the scene, depending on where he is, theoretically sees the information with a lag, but because the speed of light is so fast that within a few hundred meters the delay is too small to be felt at all.

 

It can be concluded that there is no true agreement when considering the time dimension.

 

In distributed systems, there is no need to achieve true uniformity. Because the closer to uniformity, the system is equivalent to becoming a single unit again, can only do one thing at a time, completely losing the advantage of “fast”, one of the two purposes of distributed systems. Therefore, there are many consistent variations, which are suitable for different scenarios. For the sake of understanding, let’s go from low to high severity.

 

Most of the time, in order to be as “fast” as possible, most of the schemes used in the system are called final consistency, which tolerates inconsistencies under certain conditions, ensures local consistency first, and then achieves global consistency through a series of complex state synchronizations. Finally, there are many branches that can be implemented. Here are some common ones:

 

  • Causal consistency: Only the order of operations with causality is required to be guaranteed. For example, the reply function of moments. “Have you eaten yet?” Definitely before you say yes.

  • Read your writing coherently: The text looks awkward, but it’s easy to explain. For example, if you reply a sentence in moments, other friends don’t have to see your reply right away, but you have to see it right away, otherwise where does the reply go?

  • Conversation consistency: A chat with a person can be interpreted as a conversation. Although chat also has a certain causal relationship, but most of the scenes are more logical sequence relationship. For example, if you say something, break it down into three pieces of information: First… And then… And finally… . If consistency is not guaranteed here then it may become: eventually… , first of all,… And then… .

 

Even stricter than local consistency is global order consistency [1], which ensures that all processes see the same global execution order, and that each process itself executes in the same order as it actually occurs.

Note: References for notes [1-6] can be found at the end of the paper

 

In a soccer game like the one mentioned above, if what actually happened was that ① Messi passed the ball to Ronaldo, ② Ronaldo passed it back to Messi, then everyone would see it in the same order. Even if the audience has already seen it, but we haven’t seen it yet, it doesn’t matter, the sequence of events is the same for the whole world.

 

To be more strict, a relative time consistency requirement is added on the basis of global sequential consistency, which is called linear consistency in the industry [2]. To use the analogy of messi and Cristiano Ronaldo passing the ball to each other, messi passes the ball to Cristiano Ronaldo and the whole stadium “pauses” until the message is received by all the people watching the match before Cristiano can make the next pass back. There needs to be a God (global clock) to “pause”. This is as far as we can realistically go, and the most famous system that meets this requirement is Google’s Spanner.

 

The different levels of conformance are summarized as follows:

 

2. Achieve data consistency through consensus

 

The first part we have done an analysis of the data consistency problem, so how to solve the problem caused by the failure of the inconsistent? By consensus. Therefore, this part will focus on the point of consensus.

 

1
What is “consensus”? Why does it happen?

 

The consistency problem is actually a “result” and is essentially caused by data redundancy. If there is no redundancy, there will be no consistency problem.

 

The subsystems of a distributed system can cooperate with each other because they have the same redundant data as “tokens”. Why else would I be working with you if I didn’t even know you? So this “token” has changed, and you must inform me, or I will not know you again. The process by which this “token” changes to reach agreement is called “consensus”. So:

 

The problem of consistency is the result, and consensus is the process, or a means, to achieve the result.

 

In distributed systems, the scenario of redundant data is not limited to this, because the larger the scale of the system, the less tolerance of a subsystem failure caused by the butterfly effect, so high availability is often implemented. Ming 1 is down and there are millions of Ming X’s at their posts, ideally 24 hours a day.

 

The essence of high availability is to store multiple copies of the same data and provide services externally. For example, every Xiaoming X has a “massage fingering white paper”, who can ask for leave by other Xiaoming X to provide the same massage service. But this “white paper on massage Fingering” changes, and everyone has to be notified, because this is the whole and source of the service, so the problem of data redundancy is more prominent in a cluster that is highly available.

 

In fact, it is easy to reach a consensus if each node in a distributed system can guarantee instantaneous response and trouble-free operation. Just like us, within a certain range, the message can be received by the relevant person and the response can be almost “instantaneous” as long as the voice is shouted and transmitted through the steady air.

 

However, as mentioned above, such a system only remains in the imagination, there is often a delay in response to requests, network interruption, node failure, and even malicious nodes deliberately destroy the system. This gave rise to the classic “Byzantine general problem” [3].

 

2
Question of the Byzantine general

 

We generally divide the “Byzantine general problem” into two cases:

 

  • Byzantine error. Represents an error resulting from a malicious response by forging information.

  • Non-byzantine error. An error caused by no response.

 

The heart of the question is this:

 

How a change is resolved in a distributed network to achieve consistent execution results is recognized by all parties involved, and the information is determined and irrefutable.

 

For example, how to make all xiaoming X receive “White Paper II of Massage Fingering” instead of others, and destroy the original one.

 

This problem has spawned many “consensus” algorithms, called Byzantine Fault Tolerance (BFT) for “Byzantine errors” and Crash Fault Tolerance (CFT) for “non-Byzantine errors”. From these two names, we can also see that the essential work is “fault tolerance”.

 

Some friends may not feel the importance of “fault tolerance” so strongly in their daily work — don’t you have a BUG or abnormal data? But in space, a small mistake can cause an entire launch to fail, at great cost.

 

If you want to have an in-depth understanding of the “Byzantine General problem”, you can refer to the relevant materials by yourself. I will not expand it here. At the end of the article, the paper we marked just now is attached.

 

“Byzantine errors” are not usually considered in our common software development, but they are a must for blockchain projects. However, non-Byzantine errors can be found in mainstream distributed databases, such as TiDB’s Paxos algorithm and CockroachDB’s Raft algorithm. Although we all use coding in daily life, an understanding of the underlying principles of database is not a requirement. But when it comes to high availability at the application level, at least “non-Byzantine errors” are a hurdle that must be faced.

 

BFT class algorithm

 

The BFT type algorithm has two branches. “Deterministic” and “probabilistic”.

 

Let’s talk about deterministic:

 

Such algorithms say that once a result is agreed upon it is irreversible, that is, the consensus is the final result. Its representative work is PBFT (Practical Byzantine Fault Tolerance) algorithm [4], since the central bank endorsement (blockchain digital note trading platform), more famous. The principle of the algorithm is shown as follows:

 

▲ Pictures from the network, all copyright belongs to the original author

 

Take the military as an analogy. Here line C can be considered as “commander in chief”, line 0 as “commander in chief”, line 1, line 2 and line 3 as “division commander”, but it is worth noting that the third division commander defected. The whole process is explained like this:

 

  • “Request” : The Commander in chief gives the commander an order, “Go!” .

  • “Pre-prepare” : the commander broadcast the order to the three divisions.

  • “Prepare” : After each division commander receives and agrees, “Received” will be sent to the commander and two other division commanders.

  • Commit: Each division commander receives a “receive” request for 2F divisions (commander does not prepare) and then sends “Start at any time” to the commander and two other divisions. (f is the tolerable number of Byzantine nodes)

  • “Reply” : After each division commander receives 2F +1 “ready to fire” message, he can assume that the commander in chief’s orders have reached “ready to fire” status among the relevant divisions, so he will fire directly!

 

If you really want to know more about PBFT, there are a lot of content, so I will not continue to expand here. Interested partners can refer to the paper at the end of the article.

 

Talk more about “probabilistic” :

 

The consensus result of this kind of algorithm is temporary. With the passage of time or some kind of reinforcement, the probability of the consensus result being overturned becomes less and less and becomes the final result in fact. Its representative Work is the PoW (Proof of Work) algorithm, once as high as $2W/bitcoin is based on this algorithm. The principle of the algorithm takes “xiu Xian” as a simple analogy (the algorithm in real bits is more complicated than this) :

 

  • Their efforts to cultivate, and let more than half of the immortals recognize your repair, agree that you become immortal.

  • And then you become immortal. And get involved in judging whether or not other people can become “fairies” later on.

  • If we want to achieve this through bribery, the more the number of the team is, the higher the cost of bribery is. It can be considered that the fewer people who do bribery, the lower the probability of being misjudged, and ultimately the more credible it is.

 

The formula for the probability of misjudgment is: 0.5^ number, if the number =6, the probability of misjudgment is 1.5625%. If it were 10, it would be 0.09765625%, down exponentially.

 

It is worth noting that “deterministic” and “probabilistic” have different criteria for non-cooperative nodes, with the former tolerating at most 1/3 and the latter less than 1/2.

 

4
CFT is kind of algorithm

 

As mentioned above, CFT algorithm solves the problem of consensus reaching in the scenario where there are faults in distributed system but no malicious nodes (that is, messages may be lost or repeated but no error messages). Leslie Lamport, author of the “Byzantine General Problem”, also proposed a similar “Paxos problem” in his other paper [5]. In the paper, a story is used to simulate this problem, as follows:

 

On the Greek island of Paxon, the law enforcers voted for laws in the hall of parliament and exchanged information via slips of paper passed by waiters. Each law enforcer recorded the passed laws in his own account. The problem is that both the enforcers and the waiters are unreliable. They leave the chamber at any time for various reasons, and new enforcers may enter the chamber at any time to vote on laws.

 

What means can be used to allow the voting process to proceed normally and to pass “laws” without conflict.

— Baidu Encyclopedia

 

The key objects here in our system can be analogous to:

 

  • Parliament hall = distributed system

  • Enforcer = some program

  • Server =RPC channel

  • Accounts = database

  • Law = a change operation

 

Leslie Lamport herself proposed the Paxos algorithm to solve this problem [6]. The key to this algorithm is defined by the following three definitions:

 

  • Each “change” has a unique sequence number that identifies the old and new;

  • Enforcers can only accept changes that are newer than known changes;

  • Any two “changes” must involve the same “enforcer”.

 

These three points are just the most important part of ensuring consistency, and there are many more.Interested partners can refer to the paper at the end of the reference.

 

The “Paxos” algorithm is a Leaderless algorithm, which is quite complex to implement, so there are many variations to simplify it. The most famous one is “Raft”, which came out in 2013. The Raft algorithm is a Leadership algorithm. Consensus is ensured by the following 2 processes:

 

  • There is only one living leader who is responsible for synchronizing data with the follower;

  • If the leader is “out of touch”, then each follower can become a candidate, and the one with the latest term can be compared as the new leader. This term is an increment maintained internally by each node.

 

Although followers vote on a first-come-first-served basis, there will still be multiple candidates with the same term who get the same number of votes (referred to as the “split voting problem”), and a new round of voting will be conducted until the outcome is determined. Since Raft uses a random timer to augment the term, and the network is unstable, the probability of meeting the same vote again is greatly reduced.

 

Complete process is more complicated, there is a Raft algorithm animation is recommended for everyone, interested can look at: http://thesecretlivesofdata.com/raft/.

 

As an aside, the “ZAB” (Zookeeper Atomic Broadcast) algorithm commonly used in Zookeeper is also a CFT algorithm based on the Fast Paxos algorithm.

 

5
conclusion

  

Looking back, we found that for more rigorous consistency, we needed to increase the number of confirmations that were communicated to each other, but this resulted in poor performance, as PBFT and Paxos did. But distributed systems are like that. You need Balance everywhere, and it’s important to find the best fit.

 

Having talked about the “consensus” problem at the data level, we will talk about the “distributed transaction” problem next time, which will focus on the common CAP and BASE theory.

 

Finally, if you want to become a data consistency expert, ask if there is a shortcut. The shortcut is to read the papers of Leslie Lamport, who is available at http://www.lamport.org/.

 

 

  • [1] How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, Leslie Lamport,1979.

    http://research.microsoft.com/en-us/um/people/lamport/pubs/multi.pdf

  • [2] Linearizability: A Jackety Condition for Concurrent Objects, Maurice P. Herlihy, Jeannette M. Wing, 1990.

    http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf

  • [3] The Byzantine Generals Problem, ACM Transactions on Programming Languages and Systems, Leslie Lamport, 1982.

    https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Byzantine-Generals-Problem.pdf

  • [4] Practical Byzantine Fault Tolerance, Miguel Castro&Barbara Liskov, 1999.

    http://101.96.10.63/pmg.csail.mit.edu/papers/osdi99.pdf

  • [5] Leslie Lamport, The Part-time Parliament, 1998.

    https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Part-Time-Parliament.pdf

  • [6] “In Search of an Understandable Consensus Algorithm”, Diego Ongaro&John Ousterhout, 2013.

    https://raft.github.io/raft.pdf

 

 

The author:sail

Source: Cross Boundary Architect Subscription Number (ID: Zachary_ZF)

The dBAPlus community welcomes technical staff to contribute their articles. Email: [email protected]