Overview of Byzantium

First of all, what is Byzantium?

Byzantium was the Byzantine Empire, also known as the Eastern Roman Empire, with its capital in what is now Istanbul, Turkey. As you can see in the picture above, the Byzantine Empire stretched across Asia, Europe, and Africa, and encircled the Mediterranean.

Question of the Byzantine general

In the era of cold weapons, armies led by hundreds of generals were often stationed in different parts of such vast territories for conquest and counter-conquest, as shown above.

  • The realistic problem that the armies of The Byzantine Empire faced was how to realize the unified dispatch and message interaction of all the armies in the vast territory. In that era of cold weapons, the transmission of messages and instructions could only rely on the communication soldiers to carry out manual transmission.
  • And so is because it is for the message, if communication soldiers in en route from diseased, weather, geographical and other reasons for failing to send messages, or defect, deliberately not to messaging or will be false tamper with orders to the unified command, the command from the generals will be very different, action is not unified guarantee, Therefore, the channel through which the message is transmitted is unreliable, which leads to the problem of consistency in the execution of the action.

So this is the Byzantine general problem.

The Byzantine general problem, in essence, is a problem of consistency between different individuals (generals) in a distributed environment (vast territory) receiving messages (orders) through unreliable channels (communication soldiers) and taking consistent actions.

“The Byzantine General Problem” in distributed Domain

In fact, the Byzantine general problem is a common problem existing in reality. As long as the scene and conditions are established, it can be regarded as the Byzantine general problem in a certain field, and a realistic subset of a wide range of problems.

We put this question to the computer and the Internet, distributed is common in Internet service deployment plan, due to hardware error, network congestion or disconnect and malicious attack, computer and network may appear unpredictable behavior, this scenario and Byzantine generals problems and conditions are similar.

According to this, Leslie Lamport et al. proposed the famous Byzantine Failures to discuss the difficulty of trying to achieve consistency through message passing over unreliable channels with message loss in distributed environment.

Leslie Lamport is also the author of the famous Paxos consistency algorithm. In this series of distributed introduction, Raft consistency algorithm was designed to improve Paxos. Here’s a quick look at the concept correlation clue trail, and if you’re interested, read on.

Here’s a comparison between the Byzantine Empire and distributed services in real life:

scenario source channel The message Unreliable cause
The Byzantine Empire Central <—-> General

General <—-> General
Communication soldiers The command Weather, geography, war, mutiny
Distributed service Service node <—-> Service node network message Damage, timeout, interruption, malicious attack

Core analysis of the problem

Core problem

The core problem of Byzantine general problem in distributed environment/service is how to reach consensus among nodes distributed in the network in the absence of trusted central nodes and trusted channels. This is the first issue to be addressed in distributed services, and it is the foundation and fundamental guarantee for all other issues.

Problem analysis

Here the attempt to analyze the essence of the problem requires the analysis of the root from the phenomenon to the essence, and I personally think that there is the first oneDistributed environmentThe objective reality of service clusteringNodes scatteredWhen we make full use of and enjoy the advantages brought by distribution, we also need to face and solve the unified coordination and command of massive nodesConsistency problem, and whether it is the unified control center to issue instructions or nodes to negotiate with each other must go throughMessage interactionIn order to realize communication, each other can understand the latest state information of the other party and the external environment, and the minimum possible carrying factor of carrying interaction isinformationSo we try to analyze from the root!

According to the information theory proposed by Claude Elwood Shannon, we generally divide information into three parts: source, destination and channel. Information body shuttles between source and destination as carrier through channel.

Based on the above concepts, we extend it toDistributed environmentEach service node is when it sends a messagesourceRole, if receiving messagesHe knowsRole,channelIs a two-way channel built in front of a node.

Constitute a role role There may be problems
source Communication initiator Generating body of information Cannot trace the source or exception
He knows Communication receiver Receiving body Forgery, anomaly
channel The communication channel Conveying body Hijacked, bugged, interrupted
The message body Communication carrier Load body Tamper with, lose

Therefore, we analyze that this problem is characterized by node unreliability, channel unreliability and information unreliability. Therefore, targeted solutions should be carried out according to the characteristics of these problems to ensure the achievement of consistency.

Problem solving

Solution: Verbal agreement

The implementation process

Oral agreement, also known asOral Message. A method that satisfies the following three conditions is calledOral agreement:

  • Every message sent can be delivered correctly (channel absolute trust)
  • The receiver of the message knows who sent the message (the source of the message is known, but the previous source is unknown, i.e., not traceable)
  • Be able to see missing messages

The implementation of the scheme is as follows:

  • [step-1] Each node receives the Commander Command command-1
  • [STEP-2]Each node receivesCommanderAfter the message is sent to theThe other nodesEventually, each node receives a set of messages from the other nodes{Commander messages, Commander messages passed by other nodes}To determine how to proceed next based on the message set.
node Collection of received messages
A { CommandCommander sent, the CommandB transfer, the CommandC transfer }
B { CommandCommander sent, the CommandA passing, the CommandC transfer }
C { CommandCommander sent, the CommandA passing, the CommandB transfer }

Unreliable scenario

Commander is not reliable

CommanderWhen it is unreliable, the information set received by each node is

node Collection of received messages Of judgment Node Execution Result Distributed consistency
A { command-0Commander sentAnd the command – 1B transferAnd the command – 1C transfer } Instruction inconsistency Does not perform consistent
B { command-1Commander sent, the command – 0A passingAnd the command – 1C transfer } Instruction inconsistency Does not perform consistent
C { command-1Commander sent, the command – 0A passingAnd the command – 1B transfer } Instruction inconsistency Does not perform consistent

The Node is not reliable

A few are unreliable

A few nodes are unreliablewhenAWhen it is unreliable, the information set received by each node is

node Collection of received messages Of judgment Node Execution Result Distributed consistency
A { command-1Commander sentAnd the command – 1B transferAnd the command – 1C transfer } Unreliable node Run the forgery command command-0 Most consistent
B { command-1Commander sent, the command – 0A passingAnd the command – 1C transfer } Instruction inconsistency Does not perform Most consistent
C { command-1Commander sent, the command – 0A passingAnd the command – 1B transfer } Instruction inconsistency Does not perform Most consistent
Most are unreliable

Most nodes are unreliablewhenA, CWhen it is unreliable, the information set received by each node is

node Collection of received messages Of judgment Node Execution Result Distributed consistency
A { command-1Commander sentAnd the command – 1B transfer, the command – 0C transfer } Unreliable node Run the forgery command command-0 Majority inconsistency
B { command-1Commander sent, the command – 0A passing, the command – 0C transfer } Instruction inconsistency Does not perform Majority inconsistency
C { command-1Commander sent, the command – 0A passingAnd the command – 1B transfer } Unreliable node Run the forgery command command-0 Majority inconsistency

To sum up, we can see that whether Commander commands or Node nodes execute commands, when there are a large number of unreliable nodes, the consistency in the distributed system will be damaged and cannot be guaranteed. If n represents the total number of all nodes (including Commander and Node), and M represents the number of unreliable nodes, then they must meet the quantitative relationship such as N ≧3m+1 to ensure the consistency of the distributed system. If you are interested, you can derive the relationship by yourself. The following is an example:

Total nodes (n) Commander number (1) The Node number (n – 1) Maximum node number of unreliability (m)
4 1 3 1
7 1 6 2
10 1 9 3
. . . .

insufficient

  • The message cannot be traced and the correctness of the message transmitted by other nodes cannot be determined
  • When most unreliable nodes exist, no agreement can be reached

Solution: Written agreement

In order to further solve the deficiency of oral agreement, the following improvements are made:

  • In order to ensure the correctness and traceability of messages transmitted by other nodes, signature information is added to node communication
  • When there is most unreliable nodes cannot achieve consistency, but this is an open question, cannot be achieved according to the deduced formula to arrange the total number of nodes to ensure fault tolerance, and messaging and signature information way can ensure good messaging traceable, accuracy, which can solve any scene unreliable nodes exist

The implementation process

A written agreement, also known asSigned agreement. The implementation of the scheme is as follows:

  • [step-1] Each node receives Commander Command command-1 with the signature
  • [STEP-2]Each node receivesCommanderAfter the message is sent, the message is carried outThe signatureAnd then distribute it toThe other nodesEventually, each node receives a string from the other nodesSignature informationMessage set of{Commander Signature messages, Commander messages signed by other nodes}To determine how to proceed next based on the message set.
node Collection of received messages (all with signed information)
A { CommandCommander signature, the CommandB signature, the CommandC signature }
B { CommandCommander signature, the CommandA signature, the CommandC signature }
C { CommandCommander signature, the CommandA signature, the CommandB signature }

The characteristics of

Based on the characteristics of the oral protocol, the signature protocol stipulates that each message can only be copied and then sent to other nodes with its own signature. The following features are added:

  • [1] Node signatures cannot be forged, and content changes can be detected
  • [2] Any node can recognize Commander’s signature, and unreliable nodes can forge Commander’s signature

insufficient

  • Although all messages are signed, the sender may not be trusted. It is possible that node A uses the signature of node C to notify node B, so A centralized signature management rule or institution is needed to verify the authenticity of each sender

The essence of a written agreement is the introduction of a signature mechanism, which makes all messages traceable to their source. One third of the requirements in the oral agreement are eliminated, and the reliable nodes can be reached as long as the written agreement is used.

Solution: Proof of work

PoW algorithms (Proof of Work) are widely used in unreliable distributed environments where malicious attacks and tampering exist. Blockchain technology should be widely used.

Satoshi Nakamoto in the Bitcoin white paper, through the Bitcoin protocol gives the ultimate solution as follows:

  • A proof of work mechanism is introduced so that only the first node to complete the required calculation can disseminate information, thus ensuring that there is only one proposal at a time.

  • Asymmetric encryption algorithm is introduced to provide signature technology support for information transmission, so as to ensure the privacy of message transmission, and can not be repudiated or tampered with.

The implementation of this part of the scheme is discussed separately in the following chapters.

The Byzantine general problem was proposed by Leslie Lamport and others, but it was Satoshi Nakamoto who really solved it.

Paxos algorithm proposed by Lambert, Raft algorithm and Zab protocol are all distributed consistency protocols based on the premise that messages will not be tampered or forged in a reliable channel environment. Consistency cannot be guaranteed in a distributed environment with the risk of malicious tampering such as unreliable channels.

conclusion

The Byzantine general problem does not care about the right or wrong results, but only the consistency of node execution.The existingDistributed consistency protocols and algorithmsIt can be divided into two main categories:

  • Non – Byzantine fault – tolerant algorithm

Namely, Crash Fault Tolerance algorithm (CFT) solves the consensus problem in the scenario where there are faults in distributed systems but no malicious attacks. That is, messages may be lost and repeated in this scenario, but there is no scenario in which messages are tampered with or forged. It is used for distributed systems in LAN scenarios, such as distributed databases. Common algorithms in this category include Paxos, Raft, Zab, etc.

  • Byzantine fault tolerance algorithm

It can solve the consensus problem in the scenario of both faults and malicious attacks in distributed systems. Generally used for distributed systems in Internet scenarios, such as in the blockchain technology of digital currency. Common algorithms belonging to this category include PBFT algorithm and PoW algorithm.

reference

Baike.baidu.com/item/%E6%8B… The Byzantine problem

Baike.baidu.com/item/%E4%BF… Information theory

zhuanlan.zhihu.com/p/44024734

www.cnblogs.com/aspirant/p/…

www.cnblogs.com/aspirant/p/…

www.8btc.com/article/703…

www.jianshu.com/p/81aadcea1…