Blockchain is all the rage at the moment, and the news media are full of stories claiming that it will create the future.

However, there are few easy-to-understand introductory articles. There is little explanation of exactly what a blockchain is and what makes it special.

Below, I will try to write a best understand blockchain tutorial. After all, it’s not that hard, and the core concept is so simple that it can be explained in a few sentences. I hope by the end of this article you will not only understand blockchain, but also what mining is and why it is getting harder.

To be clear, I’m not an expert in this area. Although I have been interested in blockchain for a long time, I started to understand it carefully from the beginning of this year. Errors and inaccuracies in the article are welcome to be corrected.

First, the nature of blockchain

What is blockchain? In short, it is a special kind of distributed database.

First, the blockchain’s main role is to store information. Any information that needs to be saved can either be written to the blockchain or read from it, so it’s a database.

Second, anyone can set up a server, join the blockchain network and become a node. In the world of blockchain, there is no central node, each node is equal and holds the entire database. You can write/read data to any node, because all the nodes will eventually synchronize, keeping the blockchain consistent.

Second, the biggest characteristics of blockchain

Distributed databases are not new; there are already products on the market. But blockchain has one revolutionary feature.

Blockchain has no administrators, it is completely uncentric. Other databases have administrators, but blockchain does not. If one wanted to add censorship to blockchain, it would not be possible because it is designed to prevent the emergence of a central authority.

It is unmanageable that makes blockchains unmanageable. Otherwise, once the big companies and conglomerates control the management, they will control the platform and all other users will have to do their bidding.

However, without an administrator, anyone can write data into it, so how can you ensure that data is trusted? What if the bad guys changed it? Read on. That’s the magic of blockchain.

Third, block

A blockchain is made up of blocks. A block is much like a database record, in that each time data is written, a block is created.

Each block contains two parts.

  • Head: Records the eigenvalues of the current block
  • Body: Actual data

The block header contains multiple eigenvalues for the current block.

  • To generate the time
  • Hash of the actual data (i.e. block body)
  • Hash of the previous block
  • .

Here, you need to understand what a hash is, which is necessary to understand blockchain.

A hash is a computer that can compute an eigenvalue of the same length for anything. The blockchain’s hash length is 256 bits, which means that whatever the original content is, a 256-bit binary number will be computed at the end. And you can guarantee that if the original content is different, the corresponding hash will be different.

For example, a string of 123 hash is a8fdc205a9f19cc1c7507a60c4f01b13d11d7fd0 (hexadecimal), converted to binary is 256, and only 123 can get this hash. (In theory, it is possible for other strings to get this hash, but the probability is so low that it can be considered almost impossible.)

So there are two important implications.

  • Corollary 1: Every block is hashed differently, and blocks can be identified by hashing.
  • Corollary 2: If the contents of a block change, its hash must change.

4. The immutability of Hash

There is a one-to-one correspondence between the block and the hash, and the hash for each block is calculated for the “Head”. That is, concatenate the eigen values of the block header in order to form a long string that is then hashed.

Hash = SHA256(block header)

Above is the block hash calculation formula, SHA256 is the block chain hash algorithm. Note that this formula contains only the block header, not the block body, that is, the hash is determined solely by the block header.

As mentioned earlier, the block header contains many things, including the hash of the current block body and the hash of the previous block. This means that if the contents of the current block change, or the hash of the previous block changes, the hash of the current block must change.

This has big implications for blockchain. If someone modifies a block, the block’s hash changes. In order for subsequent blocks to connect to it (because the next block contains the hash of the previous block), the person must modify all subsequent blocks in turn, or the changed block will be removed from the blockchain. For reasons mentioned later, hashing is time consuming, and modifying multiple blocks in a short period of time is almost impossible unless someone commands more than 51% of the network’s computing power.

It is through this linkage mechanism that the blockchain ensures its own reliability. Once data is written, it cannot be tampered with. It’s like history. What happened is what happened, and it’s never going to change.

Each block is connected to another block, which is where the name “blockchain” comes from.

Five, mining

Because synchronization between nodes must be ensured, new blocks cannot be added too quickly. Imagine that you have just synchronized a block and are ready to generate another block based on it, but then another block is generated on another node and you have to give up half of the calculation and synchronize again. Because each block can only be followed by one block, you can only ever generate the next block after the latest block. So, you have no choice but to sync as soon as you hear the signal.

So blockchain inventor Satoshi Nakamoto (a pseudonym whose real identity remains unknown) deliberately made it difficult to add new blocks. His design is that, on average, only one new block is created every 10 minutes, or six in an hour.

This output speed is not achieved by command, but by deliberately setting up a large number of calculations. That is, it takes an extremely large amount of computation to get a valid hash of the current block to add a new block to the blockchain. We can’t get up fast because we have too much work to do.

The process is called mining, because the difficulty of calculating an effective hash is like finding a grain of sand in all the sand in the world that fits the bill. The machine that calculates the hash is called a miner, and the man who operates the miner is called a miner.

Six, the difficulty coefficient

Reading this, you may have a question, people say that mining is hard, but mining is not a computer to calculate a hash, which is the strength of the computer ah, how can it be difficult, slow to calculate?

It turns out that not every hash is acceptable, only hash that meets the criteria will be accepted by the block link. This condition is so severe that most hashes do not meet the requirement and must be recalculated.

It turns out that the block header contains a difficulty factor, which determines how hard it is to compute a hash. For example, the difficulty factor for the 100,000th block is 14484.16236122.

The blockchain protocol states that a constant is divided by the difficulty factor to get the target. Obviously, the higher the difficulty, the smaller the target value.

The effectiveness of a hash is closely related to the target value. Only hashes less than the target value are valid. Otherwise, the hash is invalid and must be recalculated. Because the target value is so small, the chance that the hash is less than that value is extremely slim, maybe one in a billion calculations. This is the fundamental reason why mining is so slow.

As mentioned earlier, the hash of the current block is uniquely determined by the block header. If you hash the same block over and over again, it means that the block header must keep changing, otherwise it is impossible to compute a different hash. All eigenvalues in the block header are fixed. In order to make the block header change, Satoshi nakamoto deliberately added a random item called Nonce.

Nonce is a random value, and the miner’s role is to guess the value of the Nonce so that the hash of the block header can be smaller than the target value to write to the blockchain. Nonce is very hard to guess, and currently can only be done by exhaustive trial and error. According to the protocol, the Nonce is a 32-bit binary value, that is, up to 2.147 billion. The Nonce value of the 100,000th block is 274148111, which can be understood as that the miner calculates a valid Nonce value for 274 million times starting from 0, so that the calculated hash can meet the conditions.

With any luck, you might find the Nonce after a while. If you are unlucky, you may calculate 2.147 billion times without finding a Nonce, meaning that it is impossible for the current block body to calculate a hash that meets the condition. At this point, the protocol allows miners to change blocks and start new calculations.

Dynamic adjustment of difficulty coefficient

As mentioned in the previous section, mining is random, and there is no guarantee that a block will be produced in exactly ten minutes. Sometimes it will be produced in a minute, and sometimes it will not be produced in hours. In general, with the improvement of hardware equipment and the increase of the number of mining machines, the computing speed will be faster and faster.

In order to keep the output rate constant at ten minutes, Satoshi nakamoto also designed a dynamic adjustment mechanism for the difficulty coefficient. He stipulated that the difficulty factor should be adjusted every two weeks (2,016 blocks). If blocks are generated at an average rate of 9 minutes over the two weeks, that means they are 10% faster than the legal speed, so the difficulty factor will be increased by 10%. If the average build speed is 11 minutes, that means it’s 10% slower than the legal speed, so the next difficulty level has to be 10% lower.

The difficulty factor gets higher and higher (the target value gets smaller and smaller), making mining harder and harder.

Bifurcation of blockchain

Even if the blockchain were reliable, there’s still one problem left unsolved: if two people write to the blockchain at the same time, that is, two blocks join at the same time because they’re both attached to the previous block, a fork is formed. Which block should be adopted at this point?

The rule now is that new nodes always adopt the longest blockchain. If a blockchain forks, it will see which branch is behind the fork point, reaching six new blocks first (called “six confirmations”). Based on a 10-minute block calculation, it can be confirmed in an hour.

Since the rate at which new blocks are created is determined by computing power, the rule says that the branch with the most computing power is the true blockchain.

Nine,

As an unmanaged distributed database, blockchain has been running for eight years since 2009 without major problems. This proves that it works.

But to keep the data reliable, blockchain comes with its own costs. One is efficiency. It takes at least ten minutes for data to be written to the blockchain, and more time is needed when all nodes synchronize data. The second is energy consumption. The generation of blocks requires miners to do countless meaningless calculations, which is very energy consuming.

Therefore, the application scenarios of blockchain are actually very limited.

  1. There is no governing authority that all members trust
  2. The written data does not require real-time use
  3. The benefits of mining cover the costs

If the above conditions are not met, a traditional database is a better solution.

At present, the biggest (and probably only) application scenario of blockchain is the cryptocurrency represented by Bitcoin. In the next article, I will introduce you to the basics of bitcoin.

10. Reference links

  • How does blockchain really work? , by Sean Han
  • Bitcoin mining the hard way: the algorithms, protocols, and bytes, by Ken Shirriff