Last lecture we talked about hashing algorithms and public key algorithms. When we talked about hashing algorithms, we mentioned a noun “Merkle tree”. What is this tree? How does it work, and what value can they provide, now and in the future? Let’s talk about the Merkle Tree.

1

Merkle tree

A Mekker tree, also known as a hash tree, is, as the name implies, a tree that stores hash values. Is a binary tree consisting of a root node, a set of intermediate nodes, and a set of leaves. The lowermost leaf node contains the stored data or its hash value, each intermediate node is the hash value of the contents of its two children, and the root node is also composed of the hash value of the contents of its two children.

An important feature of Bitcoin is that the blocks exist in a multilevel data structure. A block’s “hash” is really just a hash of the block’s header information, a roughly 200 bytes of data that contains timestamps, random numbers, the hash of the previous block and a root hash of a data structure called a Merkle tree that stores all transactions in the block.

2

The structure and characteristics of Meckel trees

Here, we look at the data structure of a Meckel tree. All the data blocks are grouped in pairs (at the bottom of the figure), and hash Pointers to these data are stored in the parent node at the next level up.

The parent nodes are grouped in pairs again, and Pointers to the parent node are stored in the parent node at the next level up, and the process continues until we have a single block, the root node.

According to the data structure of the Merkel tree, we can clearly understand that as long as we remember the hash pointer of the uppermost root node, we can go back to any position in the table according to the hash pointer, which ensures that the data in the table will not be tampered with.

If someone tampered with some data block at the bottom of the Merkel tree, causing the hash pointer at the upper level to not match, they would have to tamper with the hash pointer at the upper level all the way to the top of the number, and at this point, the tampering would end because we stored the hash pointer at the root node.

So, as long as we remember the hash pointer at the top of the tree root, any attempt to tamper with the data will be detected, making tampering impossible.

Features of a Meckel tree:

  • MT is a kind of tree, most of which is binary tree, but also can be multi-fork tree, no matter how many fork trees, it has all the characteristics of tree structure;

  • The value of the leaf node of Merkle Tree is the unit data or unit data HASH of the data set.

  • The value of a non-leaf node is calculated using the Hash algorithm based on the value of all the leaf nodes below it.

Generally, encrypted hash methods like SHA-2 and MD5 are used to hash. But if you only want to prevent unintentional corruption or tampering of data, you can switch to less secure but more efficient checksum algorithms, such as CRC.

Second Preimage Attack: The roots of a Merkle tree do not indicate the depth of the tree, which can result in a second-preimage Attack, where an attacker creates a fake document with the same Merkle roots. A simple workaround is defined in Certificate Transparency: When computing the hash of the leaf, prefix the hash data with 0x00. When calculating the internal node is, prefix it with 0x01. Other implementations limit the root of the hash tree by prefixing the hash value with a depth prefix. Therefore, the prefix decreases with each step, and the extracted hash chain is defined as valid only if the prefix remains positive when the leaf is reached.

3

In the form of a Meckel tree

1. Binary Meckel tree

The most common and simplest form of a Mekle tree is a binary Mekle tree, in which a bucket always contains two adjacent blocks or hashes. It is described as follows:

So, what good is this weird hash algorithm doing? Why not just concatenate these chunks into a single chunk, using a regular hash algorithm? The answer is that it allows for a neat system, which we call the Merkle proofs:

A Merkel proof contains a block of data, the root hash of the Merkel tree, and all “branches” along the block to the root hash. It has been suggested that this kind of proof validates the hashing process, at least for branches. The application is simple: suppose you have a large database, and the entire contents of the database are stored in a Merkel tree, and the root of the Merkel tree is open and trusted (for example, it is digitally signed by enough trusted parties, or it has a lot of proof of work). So, if a user wants to do a key-value lookup in the database (for example, “Please tell me the object at 85273”), he can ask for a Meckel proof and receive a correct verification that the value he received is actually the specific root of the database at 85273. It allows a mechanism to validate both small amounts of data, such as a hash, and large (potentially infinite) databases.

The Bitcoin system of Meckel proof

The original application of Merkel’s evidence was the Bitcoin system, which was described and created by Satoshi Nakamoto in 2009. The Bitcoin blockchain uses A Merkel proof in order to store transactions in each block:

The benefit of this is what Satoshi describes as the “simplified Payment verification” (SPV) concept: Instead of downloading every transaction and every block, a “light client” can download only the block headers of the chain, each block containing only five items and 80 bytes of data:

  • The hash of the bulk in the upper block

  • The time stamp

  • Mining difficulty value

  • Proof of Work Random Number (Nonce)

  • The root hash of the Meckel tree that contains the block transactions

If a light client wishes to determine the status of a transaction, it can simply ask for a Meckel proof that shows a specific transaction in a Meckel tree whose root is the block head on the main chain (non-forked chain).

It will take us a long way, but bitcoin’s light customers do have their limitations. One particular limitation is that while they can prove the transactions involved, they cannot prove any current status (e.g., holding of digital assets, name registration, status of financial contracts, etc.). How many bitcoins do you own now? A Bitcoin light client can use a protocol that involves querying multiple nodes and trusting that at least one of them will notify you about any particular transaction spending in your address, and that allows you to implement more applications. But for other, more complex applications, these are not enough. The precise nature of a transaction’s impact can depend on previous transactions, which themselves depend on earlier transactions, so you can verify each transaction along the chain. To solve this problem, Ethereum’s Meckel tree concept goes one step further.

The Merkel proof of Ethereum

Instead of just one Meckel tree, each ethereum block header contains three Meckel trees, each corresponding to three types of objects:

  • Transactions

  • Receipts (Receipts, basically, a piece of data that shows the impact of each transaction)

  • State

This makes possible a very advanced light client protocol that allows light clients to easily do and verify the answers to the following types of queries:

  • Is the transaction included in a specific block?

  • Tell me that this address has sent out all instances of type X events in the past 30 days (e.g., a crowdfunding contract fulfilled its goal)

  • What is my current account balance?

  • Does this account exist?

  • Pretend to run this transaction in this contract, what is its output?

The first is handled by a transaction tree. The third and fourth are handled by the State tree, and the second by the Receipt tree. Calculating the first four query tasks is fairly straightforward. The server simply finds the object, retrieves the Meckel branch, and replies to the light client through the branch.

The fifth query task is also handled by the state tree, but its calculation method is more complex. Here, we need to construct what we call Merkle State transition proof. In essence, such proof is in said, “if you are in the root S running transaction T state of the tree, the result will be state tree’s root for S ‘, the log is L, the output of O” (” output “as a concept of exist in the etheric lane, because each transaction is a function call, it is not necessary in theory).

To infer this proof, the server creates a fake block locally, sets the state to S, and pretends to be a light client while requesting the transaction. That is, if the process of requesting the transaction requires the client to determine the balance of an account, the light client will issue a balance query. If the light client needs to check for a particular item stored in a particular contract, the light client issues a specific query for that. The server correctly “responds” to all of its queries, but the server also keeps track of all the data it sends back. The server then sends the aggregate data to the client. The client goes through the same steps, but uses the proofs provided by its database. If its result is the same as what the server asked for, the client accepts the proof.

2. Patricia Trees

As we mentioned earlier, the simplest kind of Merkel tree is the binary Merkel tree. However, ethereum uses a more complex Mekel tree, which we call “Mekel.” Merkle Patricia Tree.

A binary Merkel tree is a great data structure for validating “list” format information, essentially a series of contiguous data blocks. They’re also great for trading trees, because once the tree is established, it doesn’t matter how much time it takes to edit the tree, once the tree is established, it lasts forever.

For state trees, things get a little more complicated. The state tree in Ethereum basically consists of a key-value map, where the keys are addresses and various values, including account declarations, balances, random numbers, codes, and the store for each account (the store itself is a tree). For example, the Morden Testnet’s founding status looks like this:

{"0000000000000000000000000000000000000001": {"balance": "1"},"0000000000000000000000000000000000000002": {"balance": "1"},"0000000000000000000000000000000000000003": {"balance": "1"},"0000000000000000000000000000000000000004": {"balance": "1"},"102e61f5d8f9bc71d0ad4a084df4e65e05ce0e1c": {"balance": "1606938044258990275541962092341162602522202993782792835301376"}}
Copy the code

However, unlike transaction history, state trees need to be updated frequently: account balances and nonce of random numbers of accounts are frequently changed, and more importantly, new accounts are frequently inserted and stored keys are frequently inserted and deleted. With such data structure design, we can quickly calculate the new tree root after an insert, update, edit, or delete operation, without recalculating the whole tree. In addition, it has two secondary features that are great:

  • There is a limit to how deep a tree can be, even considering that an attacker would deliberately make some transactions to make the tree as deep as possible. Otherwise, an attacker could manipulate the depth of the tree to execute a denial of service attack, making updates extremely slow.

  • The root of the tree depends only on the data, regardless of the order in which it was updated. Changing the order of the update, or even recomputing the tree from scratch, does not change the root.

And Patricia Tree, in a nutshell, perhaps the closest explanation is that we can implement all of these features at once. The simplest explanation of how this works is that a value is stored encoded in a “path” to a record tree. Each node has 16 children, so the path is determined by a hexadecimal code: for example, the key for dog is 6, 4, 6, 15, 6, 7, so you start at the root, go down to the sixth child, then to the fourth, and so on, until you reach the end. In practice, there are some additional optimizations when trees are sparse, and we make the process more efficient, but that’s the basic principle.

4

Merkle Tree operation

1. Create Merckle Tree

There are nine data blocks added to the lowest level.

  • Step1 :(red line) hash the data block. Node0i = hash(Data0i), I =1,2… 9,

  • Step2: (orange line) connect two adjacent hash blocks in series, and then hash Node1((I +1)/2) = hash(Node0i+Node0(I +1)), I =1,3,5,7; For I =9, Node1((I +1)/2) = hash(Node0i)

  • Step3: (yellow line) repeat step2

  • Step4 :(green line) repeat step2

  • Step5 :(blue line) repeat step2 to generate Merkle Tree Root

Easy to get. Creating a Merkle Tree is O(n) hash, where n is the size of the block. So the height of the Merkle Tree is log(n)+1.

2. Retrieve data blocks

For better understanding, let’s suppose we have two machines A and B. A needs to have eight files in the same directory as B. The files are F1, F2, F3… F8. At this point we can do a quick comparison with Merkle Tree. Suppose we build a Merkle Tree for each machine at file creation time. The details are as follows:

Value of leaf node node7 = hash(f1), which is the hash of f1 file. The value of its parent node node3 = hash(v7, V8), which is the hash value of its child node node7 node8. That’s how you represent a hierarchical operation. The value of the root node is the only characteristic of the value of all leaf nodes.

Suppose file 5 on A is different from file 5 on B. How do we find different files from merkle Treee information on two machines? The comparison retrieval process is as follows:

  • Step1. First compare whether v0 is the same, if not, retrieve its children node1 and node2.

  • Step2. V1 is the same, but v2 is different. Retrieve node2’s child node5 node6;

  • Step3. v5 is different, v6 is the same, search and compare child node 11 and node 12 with Node5

  • Step4. v11 is different, but V12 is the same. Node 11 is a leaf node. Obtain its directory information.

  • Step5. The retrieval comparison is completed.

The theoretical complexity of this process is Log(N). The process description diagram is as follows:

It can be seen from the above figure that the process can quickly find the corresponding different files.

Update, insert and delete

Although there are a lot of information about Merkle Tree on the Internet, most of them do not involve Merkle Tree update, insert and delete operations, and there are many discussions about Merkle Tree search and traversal. I am also very confused, a tree structure operation must include not only search, but also update, insert and delete ah. Then I looked up a problem on StackExchange and got a little clearer.

Update the Merkle Tree data block is actually very simple, after updating the data block, then update the Hash value to the root path, so that does not change the structure of the Merkle Tree. However, insertion and deletion certainly change the structure of the Merkle Tree, as shown below. One insertion operation looks like this:

After inserting block 0 (considering block location), the structure of the Merkle Tree looks like this:

Consider an insertion algorithm that satisfies the following conditions:

  • Re-hashing is limited to log(n)

  • Data blocks are checked within log(n)+1

  • Unless n of the original tree is even, the inserted tree has no orphans, and if there are orphans, then the orphan is the last data block

  • The sequence of data blocks remains the same

  • The Merkle Tree is balanced after insertion

The result of the insert above would then look like this:

Merkle Tree insert and delete operation is actually an engineering problem, different problems will have different insert methods. If you want to ensure that the tree is balanced or log(n) tall, you can use any of the standard balanced binary tree patterns, such as AVL, red-black, stretched, 2-3, etc. These update modes of balanced binary trees can complete the insertion operation in O(LGN) time and ensure that the tree height is O(LGN). Then it is easy to see that updating all Merkle Hash can be done in O((LGN)2) time (for each node O(LGN) nodes from it to the root need to be updated in order to meet the tree height requirement O(LGN) nodes). If analyzed carefully, updating all hashes can actually be done in O(LGN) time, because all the nodes to be changed are related, that is, they are not all on a path from a leaf node to the root, or something similar.

In fact, the structure of the Merkle Tree (whether it is balanced, how high the Tree is limited) is not important in most applications, nor is it necessary to maintain the order of data blocks in most applications. Therefore, you can design your own insert and delete operations based on your specific application. A generic Merkle Tree insert and delete operation does not make sense.

Merkle Tree’s current applications include digital signatures, P2P networking, Trusted Computing, And Merkle proof for Bitcoin Ethereum.

Content Source:

Vitalik Buterin blog, CSDN- Sky Column, Chen Jianhui program life, public account -Astar blockchain Lab, currency Grandpa’s Diary

Lecture 100: What is blockchain from the village ledger

Blockchain lecture 100: Why is blockchain called “block” “chain”?

Blockchain lecture 100: What is the Byzantine problem?

Blockchain lecture 100: Bitcoin gives central banks an example, says the World Bank

Blockchain lecture 100: What is the waste of computing power used to bash blockchain?

Blockchain 100 talk: Zhihu thousands of praise answer about clear mining process

Block chain 100 talk: single fight or group fight, two kinds of mining way you want to choose?

Blockchain 100 Lecture: Ethereum ETH Mining Tutorial

Blockchain lecture 100: It is said that 80% of people can’t figure out hashing algorithms

Blockchain lecture 100:4 Concepts that public Key algorithms cannot But know

Block chain the encyclopedia dark | chatted 50 WeChat group, learning block chain required this 20 books

More than 400 white papers: Back to Basics on blockchain Technology

Activity recommended

Theme: Blockathon, challenge blockchain development, dare to come! (Click for details)

May 25-27, Blockathon2018 Beijing station, recruit 100 developers to challenge blockchain development.

Developers free, registration is subject to review. To register, identify the QR code below or click “Read the original text”.

HiBlock Blockchain community offline activities recommended:

May 19 – Free Salon:

| of the technique salon of the torture of intelligent technology security contract, land finance and Internet of things how to practice? (Shanghai)

May 27 – Free Salon:

Technique salon | learn to write intelligent contract, what to learn? (xian)