Author: energetically intelligent server team ask frequently
The introduction
Distributed consistency is a perennial problem, that is, how to ensure data consistency in distributed nodes, or reach agreement on a proposal. Vigorously intelligent server team is distributed consistency algorithm were studied, this paper mainly focused on the decentralized scenarios classic consistency algorithm, and carries on the thorough introduction, aims to promote and strengthen the consistency problem of distributed cognition, but also provides a new solution for similar scene for everyone.
The problem of distributed consistency is familiar to many technical students and can be encountered in many frameworks or middleware. The more classical implementation algorithms, such as Paxos, Raft and ZAB, have been applied and implemented in many technical frameworks.
When we say that a distributed, we will define a range in a basic with homogeneous structure and Internet range, and even within the same room, so our daily contact with distributed agreement is confined to such a homogeneous solution, under the condition of distributed systems at this time, actually to outsiders, is still a central system, The background of Alipay is a centralized system, and so is the background of Douyin.
When we extend the scope to the whole network, no matter the network structure, no matter the geographical distribution, no matter the physical conditions, the distributed consistency protocol will change so much that our existing traditional protocol can not work at all.
Decentralization means that the nodes are equal, but the complexity of the network environment and the complexity of people leads to the untrust between nodes. In this scenario, the consistency agreement becomes more challenging.
challenge
There are two main types of challenges to the distributed consistency problem:
Crash Fault
- Nodes can go down at any time and then recover at any time
- The node hardware may fail
- The network can go down at any time
- Network messages may be lost or delayed
- Messages may be out of order
- The network may be divided or isolated to form more than one sub-network
These are all due to the physical hardware in the distributed system is not reliable, unstable risk, is our system design must consider the problem.
The Byzantine Fault.
The above errors are caused by objective factors, and the nodes either do not work properly or will work as expected. Nodes can’t do bad things. To put it simply, if you ask me to send A message A, I either can’t send it, or IT must be A. I won’t change it to message B. But when you decentralize, it’s very possible to be evil. Anyone can do evil, anyone can spread false information. Once someone does something bad, how does the system work correctly, how does it keep the data consistent, that’s a big challenge. This is the Byzantine error.
The Name Byzantium comes from an ancient story. In short, n generals plan to attack a castle together. Each general can choose to attack or retreat, but all must act in concert to succeed. The generals were too far apart to communicate directly, and messages had to be sent by couriers. But couriers are unreliable. They may take too long to deliver the message, they may never deliver it, and they may even deliberately alter it. And the general is not reliable, there may be traitors inside, do not act in accordance with the proposal. Obviously, messengers in this story are used to represent unreliable channels in a distributed network, and generals are unreliable nodes.
Pictures from the Internet
In a centralized environment, where rooms, machines, and networks are all inside the company, it’s a safe and reliable environment, and we only need to worry about failures, which is something that protocols like Paxos and Raft have already addressed. In decentralized scenarios, in networks full of risks and uncertainties, besides fault errors, how to solve Byzantine errors requires the introduction of new mechanisms. You have to answer all the questions, why do I trust you, how do I prove you. All of which means that you have to take into account the machine and the human.
The solution
PBFT(Practical Byzantine Fault Tolerance)
Pictures from the Internet
Suppose there are three general ABCs who vote with each other before making a decision. Suppose General C is A traitor. He sends different messages to A and B. In this case, A receives the message attack: retreat = 1:2, so A will execute the retreat. In fact, it is impossible to always agree on a plan of action for situations in which three Generals are renegades, as detailed in The paper The Byzantine Generals Problem. In addition, there is a more general conclusion in the paper: if there are m traitors, then it takes at least 3m+1 general to reach an agreed course of action (i.e., no more than 1/3 traitors).
Pictures from the Internet
When there are four generals, we define the general who gives the order and the rest as adjutants. Careful analysis shows that the situation will be different, whether the general is a traitor, or any adjutant is a traitor, eventually there will be an agreement on a course of action.
Concept definition
Before introducing PBFT algorithm, some of the terms are defined and explained.
The signature
The premise of PBFT algorithm is to use cryptographic algorithm to ensure that the message transmission between nodes is imtamable. Common cryptographic signature algorithms include RSA, DSA, ELGamal, ECDSA (Elliptic curve algorithm) and so on. In fact, in any decentralized system, cryptographic algorithms must be used to keep messages immutable.
Each node stores its own private key and sends messages with its own public key information. The peer party can verify the validity of messages based on the public key.
View
The conformance process of all nodes operates in one view, and a PBFT system operates in constant view switching. Simply understood, each consensus round is the height in the blockchain system. Each height is a view switch.
The master node
For each view process, there must be a node that initiates the consensus, and this is the primary node.
The selection of the master node is relatively random and will be changed in each turn. Simple selection algorithm:
Active node P = View number v mod Total number of nodes n.
Each node has the potential to become the primary node and the remaining nodes as secondary nodes.
The algorithm process
Request
The client sends messages to the master node. This can be extended to include the possibility of several clients simultaneously sending messages to the master node.
Message content <d, H, S, PK >, corresponding to the following struct:
Type Request struct {Data \[\]byte // Message contents Hash \[\]byte // message abstract Sign \[\]byte // message signature PubKey \[\]byte // public key, The public key can generate the client's address (unique identifier)}Copy the code
Pre-Prepare
The primary node collects requests from clients and determines each request as follows:
- The information summary is correct. Make sure the content matches the summary
- If the signature is correct, ensure that the information is actually signed by the client and has not been tampered with.
- Invalid requests will be discarded
- Assign n (message ordering) to the messages of all valid requests, and v is the view number. Then, a pre-prepare message is generated for the message signature. The message structure is as follows:
Type PrePrepare struct {SortedReqs \[\]Request // Hash \[\]byte // This message digest Sign \[\]byte // this message signature PubKey \[\]byte // Public key, which can generate the address (unique identifier) of the client V int64 // View number, i.e. the round number, In many blockchain systems, also called Height}Copy the code
- The PrePrepare message is actually a Proposal.
Prepare
After receiving the PrePrepare message from the primary node, the secondary node performs the following operations:
- The information summary is correct. Make sure the content matches the summary
- Is the signature correct?
- View V is correct
- V is used to confirm whether the sender really belongs to the primary node of the view. Prevent malicious nodes from sending messages posing as the master node. In a particular view, the primary node is identified.
- Check whether the order of requests is correct
- If the verification is valid, the secondary node generates a Prepare message and signs the message and sends it to all nodes (including the primary and secondary nodes).
- The PrePrepare message is then recorded to the local ledger as KV with a status of uncommitted.
Type Prepare struct {PrePreare PrePrepare // pre-prepare message Hash [\]byte // This message digest Sign [\]byte // this message signature PubKey \[\]byte // Public key, which can generate client address (unique identifier)}Copy the code
Commit
Each primary and secondary nodes receives a Prepare message from other nodes. After receiving the Prepare message, perform the following verification:
- The information summary is correct. Make sure the content matches the summary
- Is the signature correct?
- View V is correct
- Check whether the corresponding PrePrepare message is stored locally based on the PrePrepareHash.
- The invalid message was discarded. Procedure If a node collects 2N /3+1 valid Prepare messages within a certain period of time, it sends a Commit message to other nodes.
Type Commit struct {PrePrepareHash \[\]byte // PrePrepare message sent by the primary node Hash hash \[\]byte // Message digest Sign \[\]byte // Message signature PubKey \[\]byte // Public key, which can generate the client's address (unique identifier)}Copy the code
Reply
After receiving the Commit message, the primary and secondary nodes need to perform the following verification:
- The information summary is correct. Make sure the content matches the summary
- Is the signature correct?
- PrePrepare Indicates whether the message exists locally
- Invalid messages are discarded. If the number of Commit messages collected from more than 2N /3+1 nodes indicates that most nodes on the network have reached a consensus, the PrePrepare message is marked as Commit. If there is a logical operation in the original request, the operation is performed and the final result is returned to the client. Of course, clients also need to receive more than half of the replies before the request is considered successful.
Abnormal control
In the process described above, each round needs to collect more than 2/3 of the feedback before consensus is reached. Therefore, there must be a time window or timeout mechanism in the collection process. If sufficient feedback is not collected within the time window, the node considers the consensus failed.
In a complex network environment, it is likely that not all nodes can receive 2/3 of the feedback. For nodes that do not receive 2/3 of the feedback, if a consensus can be reached in this view, the local data will be consistent with the data of other nodes that reach consensus through data synchronization and bifurcation processing mechanism.
In fact, the algorithm described above ensures that most nodes (more than 2/3) in a view can agree on the view data. For a small number of nodes (less than 1/3) who are not in agreement, they may not be able to reach agreement with other nodes in this view, or even perceive other nodes and reach consensus. These nodes can synchronize their data with other nodes through data synchronization and bifurcation.
Limitations and application scenarios
According to the analysis of the algorithm process, the entire consensus process involves three rounds: PrePrepare (Proposal) – Commit. In the first round, the master node sends the proposal message to the deputy node with the complexity of N (number of nodes). In the second and third rounds, all nodes communicate in pairs with the complexity of N ^2.
- Nodes must be connected in pairs
- The message complexity is n^2
- As the node scale increases, PBFT’s efficiency seriously decreases and even fails to work
- What is the value of the timeout window? A long period of time may lead to a long period of consensus and poor performance; a short period may lead to failure to complete consensus. In this case, a balance must be struck according to the network status and the number of nodes
- In one view, it is possible to fail to reach consensus if the master node is evil.
- For example, the primary node sends a PrePrepare message after the timeout period expires. As a result, there is not enough time to complete information collection in the next round
- The primary node sends different messages to different secondary nodes
- However, as the view progresses, when this view expires, the next view will be replaced by another node as the master node.
Compared with Raft and Paxos algorithm, PBFT algorithm has no more stringent requirements on the network environment. As long as there are no more than 1/3 malicious nodes in the network, the whole system is safe. But in an open network, how many nodes are there in the network? What is the rate of wrongdoing? How to limit? These questions are also uncertain. This makes it impossible for PBFT to operate on a completely open network.
In fact, PBFT works well only on a network with permission control. All nodes must pass permission verification before they can access the network. Authority audit can control the number of nodes, node qualification, and even nodes of evil tracking, so as to ensure the security of the system. There are many applications in federation chains such as fabric and quorum.
POW(Proof of Work)
Completely decentralized
PBFT cannot run in a completely open network, which determines that PBFT system is not a completely decentralized system. A completely decentralized system should have no entry threshold, anyone can join and quit at will, and the node scale is uncontrolled and unpredictable. In such a decentralized system, is it possible to reach agreement in limited time?
Before answering that question, one might ask: Do we need such networks? Why does it have to be completely decentralized? Is this a bogus requirement? In fact, this question is directly related to the significance of the existence of blockchain public chain, the answer is also different, let us go back to the original intention and purpose of distributed system to look at this problem.
- Strong fault tolerance
- Natural no single point
- Strong reading ability
- Data cannot be tampered with
For point 4, we can consider a question: With our money in the bank, are our balances immutable? Of course, we can’t tamper with the data, which is stored in the internal server of the bank system and we can’t access it. But aside from legal and ethical factors, is the data technically immutable for the bank? I’m sure you have your own answers. But we believe, or are forced to believe, that banks don’t change our balances for no reason. In the course of our economic development, banks have a kind of credibility, a kind of endorsement that makes us trust them. Because if we don’t trust banks, all of our business will be impossible. This is a forced trust. Is there a mechanism whereby we are no longer forced to trust a third party, and simply provide a proof that our transactions are credible? You don’t have to believe me, you don’t have to believe me, you just have to validate me.
In a completely open decentralized system, anyone can participate freely, anyone can access data, but not tamper with it, everyone is equal, there is no central authority, but it is extremely secure. This is also decentralization, a controversial word in the field of blockchain, which has been mentioned and studied more and more in modern times.
Consensus mechanism
Concept definition
Transaction
It is simply a request to write data to the POW system, which can be a transaction transfer in the blockchain, or a contract call.
Data blocks
Data is composed of data blocks. The system periodically generates only one data block. Each round of consensus generates a block of data.
Type BlockHeader struct {Number uint64 // Number of the data block PreHash \[\]byte // Hash of the previous data block hash [\]byte // Hash Difficulty of this data block [\]byte // PoW difficulty Nonce uint64 // a Proposer with a Proposer MerkleRoot \[\]byte // Merkle tree root Hash} type Block struct {Header \*BlockHeader // BlockHeader information Trans \[\]\*Transaction // Transaction information}Copy the code
Each block has a block Hash, which is a summary of all the information in the block and uniquely represents a block of data.
Data Chain
All the pieces of data are connected by a hash that points to the previous piece to form a single data chain, known as a blockchain. During system initialization, a genesis block is automatically generated as the starting point of the chain.
The difficulty
The essence of PoW algorithm is to allow CPU to provide computing power in the continuous operation, in order to obtain the final legal solution. The computing power of the network will change with the number of nodes and CPU hardware. In order to ensure the stability of the system, it will not be easy to complete a POW proof after the calculation force is improved, or it will not be able to complete the proof for a long time after the calculation force is reduced. Therefore, a constant mechanism is needed to resist the change of network computing force. Difficulty is a representation of the amount of work required to generate the block, the amount of CPU computation. Is a dynamic value. After a certain period, the difficulty value will be updated according to a certain algorithm to adapt to the change of the current network computing force.
Consensus process
A POW system is a completely decentralized system where all nodes are peer, so each node is doing the same thing.
Construction block head
Type BlockHeader struct {Number uint64 // Fixed value: previous block Number +1 PreHash \[\]byte // Fixed value: previous block hash \[\]byte // Fixed value: previous block hash \[\]byte // Fixed value: Pow difficulty Proposer \[\]byte // fixed values: Hash [\]byte // This data block Hash Nonce uint64 // Random salt MerkleRoot [\]byte // The data contained in this data block Merkle Tree root Hash}Copy the code
Among them:
- The first four fields are fixed values
- The Hash value needs to be finalized
- MerkelRoot represent the block trades after completion of data storage, hash value of the entire system state, including his trading order, trade execution information such as the final result, refer to www.javatpoint.com/blockchain-…
- Nonce: Random number, which is the final value that the POW algorithm looks for
Nonce calculate
That is, the calculation process of POW: a Nonce needs to be found to satisfy
sha256(BlockHeader) < target
Where target represents the target threshold of each block, which is inversely proportional to the difficulty. The smaller the target value, the more difficult it is. The conversion relationship between difficulty and target value is:
Difficulty = Difficulty_1_target/current_target, where the difficulty_1_target represents the system’s initial target value, which is a constant. Initial target 0 x00000000ffffffffffffffffffffffffffffffffffffffffffffffffffffffff COINS, and the current target value of the currency is about: 0000000000000000000143b553b553b2f8ecbd33617e683523ce0ccd3925f59ff
The calculation of SHA256 is irreversible, in other words, the only way to get a valid solution is to violently traverse the Nonce space; If the nonce space has been traversed and there is no effective solution, the transaction order is changed (MerkleRoot change) and the Nonce space is traversed again.
The threshold is dynamically adjusted periodically to adapt to the changes in the computing power of the entire network. Bitcoin adjusts the target value every 2,016 blocks (approximately 14 days) by the following formula:
Block radio
After a valid Nonce is calculated, the Nonce and its hash are filled into the BlockHeader, and the newly generated block is broadcast to other nodes over the P2P network. After receiving the block, other nodes simply verify the received block to check whether the nonce and hash meet the current difficulty level. If the block is valid, the next block is generated based on this block immediately.
y
As can be seen from the above consensus process, the work of each node is carried out independently. It is very likely that is to say, in theory there are multiple nodes at the same time draw a legitimate solution, and to generate its own block broadcasts, this time on the Internet will have a number of different blocks in the radio, and because of the uncertainty of the network transmission, would lead to different node to accept different blocks, leading to the whole system data inconsistency. This phenomenon is called bifurcation.
As shown in the figure, due to network delay, nodes in China and the United States may have two different blocks.
From the chain view:
Forking of chains occurs, and a convergence mechanism is needed so that there is ultimately only one agreed upon data chain (consistency).
A common POW processing mechanism is the longest chain principle, which accepts the longest chain and discards the short one. From an economic point of view, the longest chain accumulates more calculations and is harder to disprove.
Final consistency of probability
The longest chain principle solves the problem of convergence when bifurcation occurs. But it is still impossible for decentralized systems to achieve final consistency in theory. There are two main reasons:
- It is still theoretically possible that there are two chains that have been lengthening at the same rate, unable to tell the winner.
- There is no guarantee that the data on a blockchain is ultimately irrefutable. If a supercomputer exists and a longer chain is calculated, the existing chain will be PK dropped, causing the data already processed to be rolled back.
But the odds of both are very, very low. In fact, forking more than 2 pieces is very, very rare for 1, bitcoin’s development for more than a decade, and the current difficulty value is so high that the probability of a legitimate solution at the same time is very low. In many bitcoin exchanges, with 6 blocks (about 1 hour) as the curing time, that is, after the general transaction is processed for 1 hour, it is highly likely to reach the whole network consistency, the transaction can not be overturned.
Therefore, in a decentralized system, there is no definitive final consistency, only a probabilistic final consistency.
limitations
The POW algorithm is very simple, so simple that only brute force calculation can get the final solution.
In order to ensure sufficient security, POW needs to set a constant block generation time, that is, it must take about a constant time to get the solution. With the continuous improvement of computing power, it becomes more and more difficult to calculate and consumes more and more CPU resources and power resources. It is estimated that the electricity consumption of Bitcoin in a year is equivalent to that of the whole country of New Zealand. And that electricity consumption doesn’t add up to anything other than producing bitcoin.
Data production is slow. Bitcoin takes 10 minutes to create a block, each block is 2MB in size, can hold a very limited amount of data. The current transaction speed of Bitcoin is around seven transactions per second, which is a huge difference from the tens of millions of TPS currently used on the Internet.
With the advent of specialized mining chips like ASICS and FPGas, it’s impossible for a PERSONAL PC to mine bitcoin. Therefore, it is easy to cause the calculation power in some core industry giants, thus forming a centralized.
other
In order to solve the problem of POW computing power consumption and low performance, although many POW-like systems have made many improvements, such as Etherum reducing CPU computing costs and increasing memory costs to achieve the same level of security, the underlying problem is not solved. So there are many other algorithms.
POW is essentially all nodes competing for block rights, and only one node wins. It’s simple and crude, but it’s fair and safe. If there is a mechanism that can directly select the nodes with the ultimate block weight, the consumption will be much lower and the performance will be much higher. The core is that this mechanism must also be fair and secure.
POS(Proof of Stake)
A typical mechanism is that each round of consensus randomly selects one node from all nodes as the producer, which only needs to perform the task of generating blocks. As long as it is random enough, the algorithm is fair and just. But randomness itself is also a problem that needs to be solved. Random algorithms must first be safe, immutable and unpredictable.
To prevent nodes from doing evil, all nodes must provide a Stake in order to become nodes that can participate in POS. Once nods are caught doing evil, the deposit is forfeited, and honest work is incentivified so that nods can do honest work as long as the payoff for doing evil does not exceed the deposit.
In addition, the random selection here cannot really be random, because everyone’s Stake may be different, and therefore the probability of being randomly selected is proportional to its Stake to ensure fairness.
It is not difficult to find that no matter POW or POS, in essence, the idea of economy and game is adopted to ensure the safety and reliability of the network.
The most common blockchain projects, such as EOS (Eos.io), Ethereum Casper (casper.com), Cardano (Cardano.org/), and Dfinity (dfinity.org), all adopt the idea of POS, but differ in their implementation.
VRF(Verified Random Function)
A big problem with POS is that any node can calculate which node will be selected in the next round, that is, the selection result is apriori. Priori may cause the selected nodes to suffer malicious single point attack, or other unselected nodes to slow down, increasing the risk of system instability.
Type VRF (variable refrigerant flowrate) (medium.com/algorand/al…). Essentially a cryptography encryption method, VRF produces a unique pseudo-random verifiable output Prove for a fixed key pair (PK, SK) and input X.
Each node has its own VRF key pair. In each round of consensus, each node uses its own private key to encrypt the BlockHeader. If the calculated value falls within a specific threshold space, the conditions for block generation are met; otherwise, blocks cannot be generated. Of course, this threshold space must be proportional to the Stake to ensure fairness.
VRF algorithm is more like a secret lottery algorithm, no one can know what you draw before your lottery is published, and when you publish your lottery results, others can simply verify whether it is legal. This well solves the priori problem of common POS algorithm, so as to better guarantee the security and documentation of the system.
Current projects using VRF include Algrand (www.algorand.com/), and Dfinity claims to be introducing this mechanism to address some of the drawbacks of POS.
Impossible triangle
Under the guidance of CAP theory, centralized distributed consistency protocol makes trade-offs of consistency, availability and fault tolerance of partitions based on various scenarios. In a decentralized system, there is an impossible triangle problem in general, that is, decentralization, performance and security cannot be achieved simultaneously.
The higher the degree of decentralization, the greater the openness of the system, the more complex environment, so the greater the agreed time, thus the lower performance, on the other hand, if you want to faster to reach consensus, requires from the control network scale, network complexity, improve the barriers to entry, etc, to solve, to sacrifice the characteristics of the decentralization.
Both POW and POS, as well as the decentralized consensus algorithms derived from them, are looking for the best balance for their respective usage scenarios.
Reference:
COINS are introduced, the individual thinks more readable articles, are interested can read: aquayi. Gitbooks. IO/master – bitc…