1. Current status of Fabric performance testing

In popular terms, blockchain is a chain data structure that combines data blocks in chronological order and is cryptographically immutable and unforged distributed ledger. Bitcoin, Ethereum, and Hyperledger are typical blockchain systems. Hyperledger Fabric is the most popular enterprise-level blockchain framework. Fabric adopts a loosely-coupled design, modularizing components such as consensus mechanism and authentication, making it easy to select corresponding modules according to application scenarios.

The performance of Fabric is one of the most important concerns for users. However, there is no authoritative and neutral organization that tests the performance of Fabric according to the accepted rules and provides test reports for the following reasons:

(1)Fabric is still in rapid development and has not yet provided detailed neutral and accepted testing rules;

(2)Fabric network structure (network bandwidth, disk IO, computing resources, etc.), configuration parameters (such as block size, endorsement policy, channel number, state database, etc.), consensus algorithms (SOLO, Kafka, PBFT, etc.) all affect the evaluation results, making it difficult to build a test model that reflects the overall view of Fabric;

(3)Fabric transactions are complex, different from traditional databases, and not suitable for traditional testing schemes and tools;

This paper focuses on the core business of Fabric, builds a test model, carries out field measurement on the community’s native Fabric and Huawei cloud blockchain (based on Fabric), identifies the performance bottleneck of the community’s native Fabric, and tries to optimize the dynamic scaling and fast PBFT algorithm provided by Huawei blockchain, and improves several key evaluation indicators.

2. Fabric transaction process analysis

In Fabric transactions, different roles are involved and each role undertakes different functions. Peer nodes can be subdivided into Endorser Peer and Committer Peer, and consensus is completed by Orderer role. The transaction process is as follows:

! [](https://pic3.zhimg.com/80/v2-325375847434251484b21b6d0074c93e_720w.jpg)

Figure 1: Fabric transaction flow diagram

(1) : The application client initiates a Proposal to the blockchain network through the SDK. The Proposal sends the contract identification, contract method and parameter information, client signature and other information with the transaction to be called to the Endorser node.

(2) : After receiving the transaction Proposal, the Endorser node verifies the signature and determines whether the subventer has the right to perform the operation. After passing the verification, the smart contract is executed, and the result is signed and sent back to the application client.

(3) : After receiving the information returned by Endorser, the application client will judge whether the result of the proposal is consistent and whether to execute according to the specified endorsement strategy. If there are not enough Endorser, the processing will be suspended; Otherwise, the application client packages the data together into a transaction, signs it, and sends it to Orderers.

(4) Orderers sort the received transactions by consensus, then package a batch of transactions together according to the block generation strategy, generate new blocks and send them to committers;

(5) : After receiving the block, each transaction in the block will be verified by the Committer to check whether the input and output relied on by the transaction conform to the state of the current block chain. After completion, the block will be appended to the local block chain and the world state will be modified.

A client completing a transaction through Fabric is aware of three steps (collecting endorsements, submitting sorting, and confirming results), whereas traditional database reads and writes simply initiate a request and wait for confirmation. If classic testing tools like JMeter are used, the Fabric SDK needs to be wrapped around RESTFul interfaces, which adds complexity to evaluation. Fortunately, the Hyperledger community launched Caliper in May 2017, allowing users to test specific blockchain technology implementations through a series of preset use cases. Caliper will generate reports that include a number of blockchain performance metrics such as TPS (average transactions per second), latency, system resource usage, and more. The measurements in this article were generated using Caliper tools.

3. Build Fabric test model

The establishment of performance testing model mainly includes two parts: one is to extract evaluation indexes according to business characteristics; The second is to establish a stable and measurable business model.

3.1 Evaluation Indicators

Fabric is a typical distributed system. Each Peer in the Fabric network is deployed independently and maintains its own ledger (supporting endorsement query). Internal status synchronization is accomplished through Gossip communication. Fabric complies with partition tolerance. According to the CAP theorem for distributed systems, Fabric cannot guarantee consistency without guaranteeing availability. Fabric ensures that all nodes ultimately agree on the state of the world through final consistency (a type of weak consistency), which is the process of Orderer consensus and Peer validation. Therefore, in our test model, the following indicators are mainly investigated:

Query Throughput: Number of Query requests processed per second

Consensus Throughput: The number of Consensus requests processed per second

Consistency Throughput: Number of synchronization services completed per second

Avg Latency: Indicates the average time taken to complete a transaction

Fail Rate: Percentage of service failures (including timeouts)

3.2 Business Model

In the selection of business scenarios, we try our best to consider the mainstream scenarios, discarding the options that are bottlenecks themselves, and focusing on the core business of blockchain.

In terms of infrastructure, for Orderer and Peer nodes, we choose the mainstream 8vCPU16G virtual machine, and for Client, we choose a 32vCPU64G virtual machine. The entire test is done within a stable subnet. We configure four Orderer nodes to meet the minimum requirements of 3F +1 fault tolerance. Set the Peer node to 1 and expand the number of Peer nodes to a maximum of five.

In terms of configuration parameters, we use single channel, single organization endorsement, and choose GolevelDB for state database. The default policy (2s/4M/500T) is used for the falling block policy.

For consensus algorithm, solo, PBFT and Kafka can be selected. Solo mode is the test mode and cannot be used in production. Kafka mode A consensus algorithm that supports CFT fault tolerance and relies on the performance of a plug-in Kafka cluster. PBFT protects against Byzantine nodes for a wider range of applications and higher performance requirements. Therefore, PBFT is selected as the consensus algorithm in this test.

For chaincode, we chose the community-provided chainCode_example02 example, which has a low business data footprint and can cover the basic use cases for ledger reading and writing.

4. Measurement and tuning

4.1 Query Performance and Dynamic Scaling

Fabric query performance is nothing more than an endorsement request. The Peer consists of three processes.

(1) Verify the Proposal signature;

(2) Check whether Channel ACL is satisfied;

(3) Simulate the execution of transactions and sign the results;

See community Chaincode_example02 for the code.

Test Name Succ Fail Send Rate Avg Latency Query Throughput
1 query 10000 0 962.6 TPS 0.01 s 962tps
2 query 25000 0 2493.5 TPS 0.07 s 2492tps
3 query 50000 0 4992.0 TPS 6.68 s 2503tps

Figure 2: Query performance of a single Peer for a single organization

As you can see, the read performance of a single node (8vCPU,16G) is around 2500tps. Monitoring indicators show that CPU usage is around 70%, close to full load, while memory usage is only around 25% [Z (3)]. This is not difficult to understand, endorsement process involves a lot of verification, signature work, these are computationally intensive operations. Based on the partition tolerance of blockchain in accordance with CAP theorem, we can horizontally extend intra-organizational Peer to improve performance. Huawei blockchain has provided this scaling feature. We expanded the number of peers to 5.

! [](https://pic4.zhimg.com/80/v2-0f8f5e8ff6e4590ff36fdd6dc72ca50f_720w.jpg)

Figure 3: Dynamic scaling of Huawei BCS

Run the test script again and the result is as follows:

Test Name Succ Fail Send Rate Avg Latency Query Throughput
1 query 10000 0 971.4 TPS 0.01 s 971tps
2 query 25000 0 2495.8 TPS 0.01 s 2494tps
3 query 50000 0 4977.1 TPS 0.01 s 4974tps
4 query 12000 0 11898.9 TPS 0.06 s 11869tps

Figure 4: Query performance of huawei BCS single organization 5Peer

It can be seen that the single Peer is dynamically extended to 5Peer under the premise of constant service and stability. Performance can be improved more than 4 times, the overall throughput is over 10,000 TPS, and the average latency is only 0.06s.

4.2 Consensus Performance and Consensus Algorithm

Consensus algorithm is the key to improve consensus performance. Community Fabric V1.0.0 – Alpha2 version provides PBFT consensus as a practical Byzantine algorithm. The practical Byzantine algorithm mainly improves the inefficient Byzantine algorithm, reduces the complexity of the algorithm from exponential to polynomial, and makes the Byzantine fault-tolerant algorithm feasible in practical system applications.

Let’s first use the PBFT consensus test of the community:

Test Name Succ Fail Send Rate Avg Latency Consensus Throughput Consistency Throughput
1 invoke 1000 0 959.5 TPS 5.53 s 574tps 518tps
2 invoke 2000 0 1996.4 TPS 14.84 s 567tps 520tps
3 invoke 5000 0 4889.8 TPS 37.90 s 579tps 503tps

Figure 5: Consensus performance of community native PBFT

As you can see, the community’s native PBFT consensus is poor both in throughput and average latency. Huawei PBFT algorithm has the early-stopping nature, that is, when there is no Byzantine node, the entire network will reach consensus quickly, so it should be fast. Let’s switch to Huawei fast PBFT consensus algorithm and test it again:

Test Name Succ Fail Send Rate Avg Latency Consensus Throughput Consistency Throughput
1 invoke 10000 0 973.6 TPS 1.32 s 970tps 917tps
2 invoke 20000 0 1976.5 TPS 1.24 s 1971tps 1789tps
3 invoke 50000 0 4995.4 TPS 4.21 s 4985tps 1677tps
4 invoke 100000 0 11133.2 TPS 9.91 s 11031tps 1502tps

Figure 6: Consensus performance of Huawei BCS Fast PBFT

After huawei PBFT algorithm is used, the consensus throughput can reach 10000tps and the consistency throughput is close to 1800tps. At the same time, compared to the community native version, the average latency is also significantly reduced. Such write performance is comparable to that of traditional single-node relational databases and can meet most commercial scenarios.

4.3 About final consistency

In the testing process of consensus performance, we found that when the consensus throughput exceeded 2000, the Peer would be backlogged in the synchronization block, resulting in the increase of average delay. To understand why in detail, look at the key source code for Fabric (gossip/state/state.go) to see how the Peer chunks work:

! [](https://pic3.zhimg.com/80/v2-90287de3f36a8c057c3610c34b5211ca_720w.jpg)

Figure 7: Gossip synchronization block flow diagram

In Fabric, ledger data is primarily synchronized by the GossipStateProvider through the Gossip protocol, and only the key processes are shown here.

(1) start a collaborators deliverPayloads cheng “blank” was obtained from the orderer or other Peer, call LegerResources. StoreBlock;

(2) LegerResources calls the Validator to verify whether the transaction conforms to the endorsement policy and check whether the version in the read set is consistent with the ledger;

(3)LedgerCommittor executes legitimate transactions in the block and updates the ledger state;

(4)ServiceMediator updates channel metadata;

The author modified the source code and added the time statistics of four steps. The result shows that when 40000 transactions generate 200 blocks, the time of step 2 (check) is 17s, and that of Step 3 (write block, update index) is 40s. Both take up 80% of the time of deliverPayloads, and guess to be the bottleneck of consistent throughput rate. This hypothesis is further validated by enabling the Profile mode to monitor stack calls. [z4]

! [](https://pic3.zhimg.com/80/v2-400b28d48cc7fab7d7bb4e281146bf22_720w.jpg)

Figure 8: Gossip synchronization block Profile flame diagram

The author can think of the optimization scheme:

(1) Use high-speed read/write disks (SSDS) to improve the read/write efficiency of block files;

(2) As the verification process of Validator is computationally intensive, whether the verification efficiency can be greatly improved by combining software and hardware;

(3) Now, when Gossip receives Payload data, it can only process data one by one. Whether the block can be partitioned according to the read and write set, handed to different threads for processing, and finally merged to drop disk, to improve performance (refer to multi-channel performance is multiple of single channel);

The author learned from the media that Huawei blockchain and other product teams have invested manpower in pre-research in this area, looking forward to the early release of commercially available products and returns to the community.

5, summary

Fabric, as the most popular enterprise blockchain solution, has been successfully used in many areas. During this test tuning, it was found that community native Fabric has many limitations, such as not easy to scale, poor performance, and is not recommended to be directly used in a production environment.

Huawei blockchain’s scaling features and fast PBFT algorithm can quickly improve Fabric transaction performance. The scaling feature can improve the query performance to more than 10000Tps (more than 4 times that of a single peer) under the condition of continuous service. The fast PBFT algorithm can increase the consensus throughput rate to more than 10,000 TPS (20 times that of community native), which can meet most commercial scenarios.

At the same time, it is found that in the case of high concurrency, the average delay of final consistency will increase, mainly because the current block checksum and falling disk are executed sequentially and cannot make full use of multi-core resources. Fabric will be more successful in the commercial world if a community of subsequent versions or commercial companies can combine hardware and software, partition and merge to improve consistent throughput and reduce latency.

6. Reference materials

hyperledger-fabric.readthedocs.io

Github.com/hyperledger…

Github.com/hyperledger…

Github.com/yeasy/hyper…

Performance Benchmarking and Optimizing Hyperledger Fabric.pdf

Click to follow, the first time to learn about Huawei cloud fresh technology ~

This article is shared from The Huawei Cloud community’s Fabric-based Performance Testing and Tuning Practice by BCSBlockchain.