With blockchain, all data is open, but not everyone can write code to view transactions on the chain, most people will view the data through a public window. This public window is the Blockchain browser.

The data on the blockchain continues to grow, and persistence and querying of that data is key for browsers. This article will explain the design idea of a blockchain browser.

1. System design

When a transaction is linked, it is stored in an on-chain ledger, but the data in the ledger cannot be displayed directly. Therefore, it is necessary to parse and store these ledger data, and then display the data from different dimensions. The system itself will not be very complex, the overall design is as follows:

All the system does is pull the data off the blockchain, put it into a database, and then serve it through an API

Two important parts of the system are the parser of the block and the part that pulls the block.

1.1 Storage Selection

The selection of data storage is very important. Using MySQL storage will bring some problems, such as difficult field expansion and inconvenient storage expansion. In addition to storage, the data also needs to be able to support multi-dimensional searching, which MySQL clearly can’t do either.

Blockchain usually uses KV database for storage, usually LevelDB, CauchDB, TiKV, etc., but this kind of database has poor support for search. For the browser, search is the most important part.

ElasticSearch is a good choice in this case, ES support itself supports storage of large amounts of data, and can scale horizontally. Real-time search can also meet the needs of front-end data presentation.

1.2 Design of the parser

A blockchain is a chain-like structure where all the blocks are connected by computing the hash of the previous block:

Each block is packed with a series of transactions:

When parsing a block, you need to keep opening the block layer by layer until every transaction in it has been resolved. And then it’s stored in ES according to different categories. This seems like a fairly complex process that can be accomplished by designing it as a single interface.

1.3 Safety considerations

Because the browser is an open system, everyone can access, so the security of the system needs to pay special attention to avoid attacks on the system. There are two types of protection required here:

  • DDOS attack
  • For fear of someone using a crawler to retrieve data from the system

At the gateway layer, some firewalls can be deployed, and at the API layer, IP restriction policies can be used, such as limiting the maximum number of accesses to the same IP address in a period of time. Keep your browser as stable as possible.

2. Problems encountered

Once the above design is complete and implemented, it is ready to use. But found in the actual use of the process, or there will be some problems.

2.1 How to process stock Data

When pulling data from the blockchain, there may already be a lot of data in the block, and it may take a long time to pull data using a single thread. Since a single thread can’t pull a block fast enough, multiple threads are used, so the pattern is changed to the following:

That’s a lot faster, but it’s not enough if there’s a lot of data on the chain,

But at this point, there’s really no way to speed up the pull by adding threads. If the block is pulled too fast, the number of parsed threads will increase dramatically and the program will crash.

So there are other ways to increase the speed of the pull.

Since a single instance cannot continue to improve performance, multiple instances can be used to improve performance, but this introduces the problem of synchronizing state between multiple instances.

Analysis shows that a lot of intermediate data, such as blocks generated per minute and transactions generated per minute, are not counted in the process of block crawling. In fact, you only need to synchronize the height of the block between multiple instances, and let different blocks pull different heights.

In order to reduce the extra dependence of the system, we finally decided to use MySQL pessimistic lock to synchronize the block height, in order to reduce the frequency of locking, reduce the number of times to acquire the lock. MySQL > update height by 10; MySQL > update height by 10; MySQL > update height by 10;

Because each instance itself has multiple threads pulling data, avoid each thread fetching a height while pulling. The block height is obtained by the number of threads in the instance. For example, if the number of threads is 10, it is directly preempted for 10 heights in MySQL and pulled at the same time.

After this process, the pull block speed is greatly improved. It only takes about 5 days to pull 50 million stock data on the line, which is perfectly acceptable.

After the stock data is pulled, it can be returned to the single-instance, multi-threaded pull mode, which can save more resources.

2.2 Block processing Failed

During the process of pulling a block, there may be various reasons for the block pulling failure. The block that failed to be processed needs to be reprocessed.

In this design, the processing is done in a simpler way, by putting the failed block in a failure queue, and then a thread listens on the failure queue to get the height of the reprocessing.

3. Read and write separation

But so far, the browser isn’t perfect. The parse and pull blocks are now tied together, and an ES write error will cause the entire block to fail to pull. The pull has to be done by retry. This will affect the overall pull speed.

As blockchain business grows, blocks are being created faster and faster. The current method of pulling may not keep up with the speed of the block pulling. So message queues need to be introduced to completely decouple the block and the parsing part of the block, as shown below.

In this way, the block pulling and parsing will not block each other, and the system will be more stable.

The text/Rayjun