Blockchain is essentially a special kind of distributed database. The biggest difference of each blockchain platform is mainly reflected in the optimization and reform of consensus algorithm, while the consensus layer of blockchain and traditional distributed database determines the upper application of both. In blockchain, BFT consensus algorithm is mostly used to solve the consensus reached by each node in the case of mutual distrust. However, the traditional distributed database is generally established on the basis of the CFT consensus algorithm for data consistency in the case that each node does not exist evil, so that each node can reach a consensus. This paper mainly introduces two kinds of BFT and well-known consensus algorithms in CFT.

Byzantine Fault Tolerance (BFT) is a Fault Tolerance technique in distributed computing. Byzantine fault tolerance derives from the Byzantine general problem, which is a model of the real world. Computers and networks may behave unpredictably due to hardware errors, network congestion or interruption, and malicious attacks. Byzantine fault-tolerant techniques are designed to deal with real-world anomalies and meet the normative requirements of the problem to be solved.

The blockchain network environment conforms to the Byzantine Generals problem model, with servers that work well, servers that fail, and servers that break. The core of consensus algorithm is to form consensus about network state among normal nodes.

1

POW

Proof of Work (POW), or “Proof of Work” or “mining”, is a process in which each node competes with hardware resources to obtain accounting rights. Bitcoin and Ethereum are the two most well-known public chains. At present, the underlying consensus algorithm is POW, which was improved by Ethereum in 2015.

A) POW in Bitcoin

General process:

  1. Generate coinbase transactions and form a trade list with all other transactions ready to be packaged into the block to generate Merkle root hashes;

  2. Assemble Merkle root hash and other related fields into block headers, and take 80 bytes of block header data as input of proof of work;

  3. The random number Nonce in the block header is constantly changed, and the double SHA256 hash operation is performed on the changed block header, and the result value is compared with the target value of the current network. If the value is smaller than the target value, the problem is solved successfully, that is, POW is completed. Hash algorithm (Hash algorithm (block header)) < target value, Figure 1 is the core part of bitcoin POW source code, code directory is SRC/RPC/mine.cpp.

Figure 1 Bitcoin POW source code ▲

B) POW in Ethereum

With the emergence of special mining equipment ASIC mining machine, bitcoin POW mining ordinary PC nodes can not participate in mining. Ethereum has made a series of improvements to bitcoin’s consensus mechanism, redesigned the Ethereum POW (ETH) consensus algorithm, called Ethash.

Ethash has high requirements on memory size and bandwidth but low requirements on computing capability. The Directed Acyclic Graph (DAG) data set, which takes up a large amount of memory and changes every 30,000 blocks, limits the performance of high-power devices and reduces the computing advantage.

The general process is shown in Figure 2:

Figure 2 Ethash mining flow chart ▲

  1. For each block, a seed is calculated. The calculation of seeds depends only on the current block information;

  2. Using seeds to generate pseudo-random data sets, called caches. The lightweight client needs to save the cache.

  3. Generate a 1GB data set based on the cache, called the DAG. Each element of the dataset depends on some element in the cache, and the cache allows you to quickly calculate the element at a given location in the DAG. Full mineable client needs to save DAG;

  4. Mining can be summarized as randomly selecting elements from the DAG, and then violently enumerating the selection of a nonce value and hashing it to meet the convention’s difficulty, which is essentially how many zeros the hash value prefixes. When validating, it evaluates the DAG element at the specified location based on the cache and verifies that the hash value of the set of elements is less than a certain value.

2

POS and DPOS

a) POS

POS is used as a substitute algorithm because POW algorithm wastes a lot of environment and electricity in the process of mining. POS (Proof of Stake) is also known as Proof of Stake. The general idea is as follows:

  1. Give the production block to the person with more tokens, the more tokens, the higher the probability of accounting rights;

  2. In the process of producing blocks, tokens are rewarded, which can be understood as the interest brought by holding tokens;

  3. People who have a lot of tokens, if they attack the network, it’s going to cause the price of tokens to go down, which is not good for those people, so these miners are less willing to attack the network;

  4. Production blocks only need to prove the tokens they hold, which does not require much computing power and saves energy.

Although POS achieves the purpose of saving resources to a certain extent, it introduces new problems, such as nothing at stake (Since there is no computational cost to construct a new block, when the blockchain bifurcates, users may tend to mine in multiple branches at the same time to obtain potentially higher profits. This creates a large number of branches and destroys consistency), which in extreme cases leads to centralization, with more coins being held longer, resulting in the Matthew effect.

b) DPOS

DPOS (Delegated Proof of Stake) enables each owner of the token to vote and eventually pick out N super nodes to produce blocks. If the super nodes fail to generate blocks, they are removed and the network picks out new super nodes to replace them.

Taking EOS DPOS as an example, the general process is as follows:

  1. Token holders can vote on candidate miners; When a transaction takes place, users include their vote in the transaction;

  2. The 21 users with the highest votes are selected as representatives to be responsible for producing blocks in the next cycle;

  3. After shuffling the order of the representatives, the representatives begin producing blocks in turn. Each delegate has its own fixed time interval within which to complete the production release of the block. The current interval is three seconds, meaning that a block is produced every three seconds under normal conditions;

  4. When each representative produces a block, it needs to find the only longest chain at that time for production, not on other branches.

3

PBFT

PBFT stands for Practical Byzantine Fault Tolerance. This algorithm was proposed by Miguel Castro and Barbara Liskov in 1999. It solves the problem of the inefficiency of the original Byzantine fault tolerance algorithm and is widely used in federation chain scenarios such as Fabric0.6.

The abnormal flow of PBFT algorithm is roughly shown in Figure 3:

Figure 3 PBFT algorithm flow chart ▲

  1. The client sends a request to the master node.

  2. The master node broadcasts the request to other nodes, and the nodes execute the three-stage consensus process of PBFT algorithm.

  3. Pre-prepare phase: The leader node verifies the validity of the request message M, assigns the sequence number to the request in its view, and broadcasts a pre-prepare message about the allocation to all other nodes.

  4. Prepare phase: The non-leader node validates the request message and accepts the sequence number. If the node agrees to the allocation scheme, the corresponding PREPARE message is broadcast to all other nodes. This stage requires all nodes to agree on a globally agreed order.

  5. Commit phase: After receiving the prepare signature message from cluster 2F +1, the node broadcasts the COMMIT message to all other nodes. At this stage, all nodes have agreed on the order and acknowledge receipt of the request.

  6. After receiving a 2F +1 COMMIT message from the cluster, the node executes the request, drops the disk, and returns the message to the client.

  7. When the client receives the same message from f+1 nodes, the consensus is correctly completed (f is the number of rebellious or bad nodes that can be tolerated, a total of 3F +1 nodes).

CFT (Crash Fault Tolerance) is a Fault Tolerance technique for non-Byzantine problems. Paxos is a common problem in the distributed consensus field when there are faults in a distributed system but no malicious nodes (that is, messages may be lost or repeated but no error messages). It was first named by Leslie Lamport using a story model of Paxon Island. The Paxos algorithm mainly includes Paxos series algorithm and Raft algorithm. Both Paxos algorithm and Raft algorithm belong to strong consistency algorithm.

1

Paxos

The basic idea of the Paxos algorithm is similar to a two-stage submission: multiple proposers first have to earn the right to submit a proposal (with the support of a majority of the recipients). The successful proposer sends the proposal to all for confirmation, and the proposal confirmed by the majority becomes the approved final proposal.

The Paxos protocol has three roles:

A Proposer makes a Proposal. The Proposal information includes the Proposal ID and the Value of the Proposal. Acceptors participate in decisions to respond to proposals made by Proposers. If the Proposal is accepted by a majority of Acceptors, the Proposal is approved. If the Proposal is accepted by a majority of Acceptors, the Proposal is said to be approved.

Paxos is a two-stage communication protocol. The basic flow of Paxos algorithm is shown in Figure 4:

Figure 4 Paxos algorithm flow chart ▲

  1. Proposer generates a Proposal ID n, using a timestamp + a Server ID to generate a unique Proposal ID.

  2. Proposer broadcasts a Prepare(n) request to all Acceptors;

  3. Acceptor compares n and minProposal (previously accepted proposals) if n>minProposal, minProposal=n, and returns acceptedProposal and acceptedValue;

  4. If a majority of the proposals are received and any acceptedValue is returned, the proposal numbered numbered with the largest acceptedValue, if not the value of the proposal.

  5. At this point we can proceed to phase 2, broadcast Accept (n,value) to all nodes;

  6. If n>=minProposal, acceptedProposal=minProposal=n, acceptedValue=value. Otherwise, return minProposal;

  7. After the proposer receives more than half of the requests, if the return value result >n is found, it indicates that there is an updated proposal and jumps to 1. Otherwise, the value is agreed.

2

Raft

Inspired by Paxos’s obscurity, Ongaro came up with Raft algorithm in 2014, which is much easier to understand. It breaks consistency down into sub-issues: Leader election, Log replication, Safety, Log compaction, and Membership change.

Raft algorithm deals with the consistency problem by the leader node. The Leader node receives the request log data from the client and synchronizes it to the other nodes in the cluster for replication. When the logs have been synchronized to more than half of the nodes, the Leader node notifies the other nodes in the cluster which logs have been successfully replicated and can be submitted to the RAFT state machine for execution.

A node in a cluster can be in only one of the three states: leader, follower, or candidate. All nodes are in the follower state. In order for followers to become leaders, they must first become candidates and then initiate election voting. If there are not enough votes, it returns to the follower state. If more than half of the votes are cast, the leader becomes the leader. If a fault occurs after the user becomes the leader, the user automatically steps down and returns to the follower state.

Raft also introduces the concept of term to identify outdated information in a timely manner, similar to the Epoch in ZooKeeper; Term value increases unidirectionally, each term has at most one leader; If information of different terms conflicts, the information with a larger term value prevails.

Raft also uses heartbeat packets and timeouts where the leader is constantly sending heartbeat packets to other nodes in the cluster in order to maintain his authority. If a follow does not receive a heartbeat packet after the election timeout, the leader is considered to have died and the follow enters the candidate state to run for the leader.

Raft’s leader election is implemented with heartbeat and a random timeout; The log replication stage is achieved by strong leadership: the leader receives the client’s command, appends it to its own log, and copies the log to other followers. Raft’s security is ensured by the fact that only the Leader can decide whether to commit.

From Paxos and Raft to PBFT, from POW to POS, to various Paxos variants, Raft variants, BFT hybrid new algorithms, and various improvements based on POW algorithms, distributed consistency algorithms are constantly developing, improving, and evolving. Even companies in combining the reality of their own business, research and development of all kinds of consistency of distributed algorithms for your scene, the consensus algorithm, are also not deep, from the traditional distributed consensus in a closed network, to have access rules of chain alliance, everybody can participate in the public chain to open the agreement mechanism in the network environment, We still have a long way to go.

References:

* 1. www.scs.stanford.edu/14au-cs244b… * 2. People.cs.umass.edu/~emery/clas… * 3. Lamport.azurewebsites.net/tla/byzsimp… * 4. Blockonomi.com/practical-b… * 5. medium.com/@VitalikBut… * 6. Bitcoin.org/bitcoin.pdf * 7. En.wikipedia.org/wiki/Proof-… * 8. Bitnodes.earn.com/ * 9. www.giottus.com/Bitcoin * 10. www.nichanank.com/blog/2018/6… * 11. En.bitcoinwiki.org/wiki/DPoS * 12) amplab) lot. IO/cs262a – fall… * 13. www.the-paper-trail.org/post/2012-0… *14. https://github. com/ethereum/wiki/wiki/White-Paper 15. hyperledger-fabric-cn.readthedocs.io…