In the current information age of data explosion, “blockchain + big data” has become a hot topic of research with the decentralized, point-to-point and tamper-proof characteristics of blockchain. It can be said that the combination of blockchain and big data has laid a foundation for the large-scale implementation of blockchain applications in the future.
So how is the data stored in the blockchain? What are the similarities and differences between different blockchain data storage mechanisms? Take Ethereum as an example. In this article, Vasa, co-founder of MIT-incubated startup TowardsBlockChain, elaborates on the data storage mechanism of Ethereum, how Ethereum stores blockchain state and transactions, and the similarities and differences between Ethereum and Bitcoin in the storage mechanism.
In addition, this article takes you through the theoretical underpinning of the “Patricia dictionary tree” data structure and demonstrates the implementation of the Ethereum dictionary tree using Google’s levelDB database.
Words are key, lines of code are dry, please look down!
In terms of architecture design, blockchain can be simply divided into three layers: protocol layer, extension layer and application layer. The protocol layer can be divided into storage layer and network layer, which are independent but inseparable.
1
What is stored in the data store tier?
First of all, let’s understand the data storage layer of blockchain. What is the data storage layer of blockchain? What does it store? What data does it need to store for a blockchain system to work?
Let’s say Alice transfers $10 to Bob. As you can see from the diagram above, the current state of the blockchain can be changed by adding a transaction to it.
As well as tracking the account balances and other relevant details of different users, it is also important to track the details of blockchain state transitions caused by different users through blockchain transactions.
Different blockchains, such as Bitcoin and Ethereum, use different methods to do this.
1.1 The “State” of Bitcoin
Bitcoin’s “state” is represented by its network’s unused UTXO Transaction Output. Bitcoin transfers value through transactions. More specifically, bitcoin users can spend one or more UTXOs by creating a transaction and adding one or more of them as input to the transaction.
Bitcoin’s UTXO model is the main feature that distinguishes it from Ethereum. To better understand the difference, let’s take a look at some examples.
First, utXOs in Bitcoin cannot be spent only partially, they must be spent entirely.
If a Bitcoin user spends 0.5 bitcoins and only has a UTXO worth 1 bitcoin, he must add his own Bitcoin address to the output of the transaction, sending himself 0.5 bitcoins as change.
If he does not send himself change, he will lose the 0.5 bitcoin, which will be paid as a transaction fee to the miner who mined the block.
UTXO trading
Second, bitcoin’s blockchain does not, per se, store and update a user’s account balance. In a Bitcoin network, users need only hold one or more UTXO private keys.
The use of digital wallets makes it look like bitcoin’s blockchain is automatically storing and updating users’ account balances, but it’s not.
Illustration of how a Bitcoin wallet works
Bitcoin’s UTXO model works well, in part because digital wallets can perform most transaction-related tasks, including but not limited to:
-
Processing UTXO
-
Store the key
-
Set transaction fees
-
Provide the address for change
-
Summary UTXO (shows available, transactions in progress, and total balances)
How to describe transaction behavior in UTXO model? Money is a good analogy.
Users count their money by adding up the bills in a wallet (like a Bitcoin address or digital wallet) (like a UTXO), and when they want to spend money, they use one or more bills.
Each bill can only be used once, because once spent, it no longer belongs to you.
Therefore, it can be concluded that:
-
The Bitcoin blockchain does not store and update account balances
-
The Bitcoin wallet holds the UTXO corresponding private key
-
If a UTXO is included in the transaction, it will be spent entirely (a new UTXO “change” will be received if the UTXO is greater than the amount spent)
1.2 Ethereum “State”
Unlike the aforementioned Bitcoin blockchain, states in the Ethereum blockchain store and update information such as a user’s account balance.
Ethereum state is not an abstract concept, it is part of the underlying protocol of Ethereum.
As mentioned in the Ethereum Yellow Book, Ethereum is a transaction-based “state machine”, a technology that can build all transaction-based “state machines”.
Like all other blockchains, Ethereum’s blockchain extends from Genesis.
Starting with The Trands block, actions such as trading, deploying smart contracts and mining will constantly change the state of the Ethereum blockchain. In Ethereum, the account balance (stored in the state dictionary tree) changes every time there is a transaction associated with that account.
Data such as account balances are not stored directly in the blocks of the Ethereum blockchain, only the hashes of the root nodes of the transaction dictionary tree, state dictionary tree, and receipts dictionary tree are stored directly in the blockchain. The diagram below:
The root node hash that stores the dictionary tree (where all smart contract data is stored) actually points to the state dictionary tree, which in turn points to the blockchain.
There are two distinct types of data stored in Ethereum: permanent and temporary.
Transaction information is permanent data. Once a transaction is fully confirmed, it will be recorded in the transaction dictionary tree, which will never change. The account balance is temporary data, and the account balance corresponding to the address is stored in the tree of the state dictionary and changes whenever there is a transaction associated with that specified account.
Therefore, permanent and temporary data should be stored separately and separately, and Ethereum uses the data structure of a dictionary tree to manage data.
An analogy with The record-keeping mechanism of Taifang is the use of ATM/debit cards.
The bank tracks the balance of each debit card, and when the user needs to spend money, the bank checks the transaction record to determine whether the user has enough balance to make the transaction.
1.3 Comparison between bitcoin UTXO model and Ethereum account/balance model
Advantages of the Bitcoin UTXO model:
-
Scalability: The ability to process multiple UTXOs simultaneously enables parallel transactions and facilitates innovation in scalability.
-
Privacy protection: Even if Bitcoin is not a completely anonymous system, the UTXO model provides a higher level of privacy protection as long as users use a new address for each transaction. If you need more privacy protection, consider using a more complex scheme, such as ring signatures.
Advantages of ethereum account/balance model:
-
Simplicity: Ethereum has chosen a simpler and intuitive model that makes it easier for developers to implement complex smart contracts, especially those that require information about the state of the Ethereum network or involve multiple parties.
For example, if a smart contract performs different tasks based on different states of the Ethereum network, using UTXO’s stateless model would require mandatory state information to be included in every transaction, which would complicate the design of smart contracts.
-
Efficiency: In addition to simplicity, the Ethereum account/balance model is more efficient because each transaction only needs to verify that the sender account has enough balance to pay for the transaction.
To protect the Ethereum account/balance model from a double payment attack, an increasing random number can be used to protect against this type of attack.
In Ethereum, each account has a publicly visible random number that increases by one each time a transaction is made, a mechanism that prevents the same transaction from being submitted more than once.
This random number is different from the ethereum work-proof random number, which is a random value from a mining process
** In computer architectures, there are sometimes trade-offs between different models. ** Some blockchain technologies, such as Hyperledger, have adopted UTXO mechanisms because they benefit from the innovations derived from the Bitcoin blockchain.
Let’s briefly examine more techniques based on these two recordkeeping models.
2
Ethereum dictionary tree data structure
Ethereum dictionary tree data structure mainly includes state dictionary tree, storage dictionary tree and transaction dictionary tree.
2.1 State dictionary tree – unique existence
There is a unique network-wide dictionary tree in the Ethereum network.
The network-wide state dictionary tree is constantly updated.
This network-wide dictionary tree contains key and value pairs for each account in the Ethereum network.
The “key” in the network-wide dictionary tree is a 160-bit identifier (the address of an Ethereum account).
The “values” in the network-wide state dictionary tree are generated by encoding the following details of ethereum accounts using Recursive Length Prefix Encoding (RLP) method:
-
Nonce: a publicly visible random number. If the account is an external account, this number represents the number of transactions sent from the account address; If the account is a contract account, the Nonce is the number of contracts created for the account.
-
Balance: The number of Wei (ethereum currency units) held by this address, 1E +18 Wei per Ethereum.
-
StorageRoot: A hash of a Merkle Patricia root node that encodes the hash value of the account’s stored contents and defaults to null.
-
CodeHash: Hash code for EVM (Ethereum Virtual machine). For contract accounts, this is code that has been hashed and stored as a codeHash; For external accounts, the codeHash field is the hash of an empty string.
The root node of the state dictionary tree (the hash of the entire state dictionary tree at a given point in time) is used as the safe and unique identifier of the state dictionary tree. The root node of a state dictionary tree is cryptographically dependent on all the data inside the tree.
The relationship between the state dictionary tree (levelDB implementation of Merkle Patricia dictionary tree) and ethereum blocks
State dictionary tree: The kecCAk-256-bit hash of the state dictionary root node is stored as a “stateRoot” value in a given block
X8c77785e3e9171715dd34117b047dffe44575c32ede59bde39fbf5dc074f2976 stateRoot: ‘0’
2.2 Store dictionary Tree — A place to store smart contract data
The store dictionary tree stores all smart contract data, and each Ethereum account has its own store dictionary tree. The 256-bit hash value of the root node of the storage dictionary is stored as the “storageRoot” value in the global state dictionary tree.
2.3 Transaction dictionary tree – one per block
Each Ethereum block has its own separate transaction dictionary tree.
A block contains many transactions, and the order of transactions in the block is determined by the miners who excavate the block.
The path to a particular transaction in the transaction dictionary tree is encoded by RLP to obtain the index of the transaction in the block.
Due to the tamper-proof nature of the blockchain, blocks that have already been dug up cannot be changed, so the location of transactions in the block will never change.
Once the transaction is found in the block’s transaction dictionary tree, the retrieved result is the same even if you repeatedly return to the same path.
3
Ethereum dictionary tree example analysis
Mainstream Ethereum clients use two different database software solutions to store dictionary trees. Parity, Ethereum’s Rust client, uses the rocksDB database, while Ethereum’s Go, C ++, and Python clients all use the levelDB database.
In this article, you will be introduced to the levelDB database.
Ethereum and levelDB databases
LevelDB is an open source Google key-value repository that, in addition to regular features, provides forward and backward iteration of data, ordered mapping from string keys to string values, custom comparison functions, and automatic compression.
The automatic compression function uses the open source Google compression/decompression library “Snappy”. The Snappy library was not designed for maximum compression, but for very high compression speeds.
The **LevelDB database is an important storage and retrieval mechanism for managing the state of the Ethereum network. ** Therefore levelDB is the underlying database for mainstream Ethereum clients (nodes) such as Go-Ethereum, CP-Ethereum, and Pyethereum.
While it is possible to implement the dictionary tree data structure on disk (using database software such as levelDB), it is important to note the difference between walking through the dictionary tree and simply looking at a key/value database.
To illustrate these differences in more detail, you can use the Patricia dictionary tree library to access the data in the levelDB database.
On the Ethereum client, perform network operations such as trading, deploying smart contracts, and mining, and observe how they affect ethereum’s “state.”
4
Analyze the Ethereum database
Each block in the Ethereum blockchain contains a number of Merkle Patricia dictionary trees:
-
State dictionary tree
-
Store dictionary tree
-
Transaction dictionary tree
-
Collection dictionary tree
To reference a particular Merkle Patricia dictionary tree in a particular block, you need to get its root node hash as an index.
Get the root hash of the status dictionary tree, transaction dictionary tree, and collection dictionary tree in genesis block using the following command:
web3.eth.getBlock(0).stateRoot web3.eth.getBlock(0).transactionsRoot web3.eth.getBlock(0).receiptsRoot
If you want the root hash of the newly dug block (instead of genesis block), use the following command:
web3.eth.getBlock(web3.eth.blockNumber).stateRoot
After obtaining the root node hash value, you need to configure the network environment.
4.1 Installing NPM, Node, Level, and EthereumJS
Use node.js, Level and EthereumJS (ethereum virtual machine written in JavaScript language) to experiment with the levelDB database.
Run the following command to configure the experiment environment:
cd ~
3sudo apt-get update
5sudo apt-get upgrade
curl -sL https://deb.nodesource.com/setup_9.x | sudo -E bash - sudo apt-get install -y nodejs
sudo apt-get install nodejs
npm -v
nodejs -v
npm install levelup leveldown rlp merkle-patricia-tree --save
git clone https://github.com/ethereumjs/ethereumjs-vm.git
cdEthereumjs -vm NPM install Ethereumjs -account Ethereumjs-util -- saveCopy the code
Once the experimental environment is configured, running the following code will print out a list of ethereum accounts and corresponding keys (stored in the ethereum private network’s stateRoot directory), connect to ethereum’s levelDB database, enter the ethereum private network’s state (using the stateRoot value of the block in the blockchain), Then access the keys for all the accounts on the Ethereum private network.
//Just importing the requirements
var Trie = require('merkle-patricia-tree/secure');
var levelup = require('levelup');
var leveldown = require('leveldown');
var RLP = require('rlp');
var assert = require('assert');
//Connecting to the leveldb database
var db = levelup(leveldown('/home/timothymccallum/gethDataDir/geth/chaindata'));
//Adding the "stateRoot" value from the block so that we can inspect the state root at that block height.
var root = '0x8c77785e3e9171715dd34117b047dffe44575c32ede59bde39fbf5dc074f2976';
//Creating a trie object of the merkle-patricia-tree library
var trie = new Trie(db, root);
//Creating a nodejs stream object so that we can access the data
var stream = trie.createReadStream()
//Turning on the stream (because the node js stream is set to pause by default)
stream.on('data'.function (data){
//printing out the keys of the "state trie"
console.log(data.key);
});
Copy the code
The output of the above code
Accounts in the Ethereum network are added to the status dictionary tree only when a transaction (a transaction related to that particular account) occurs.
For example, a new account created using the command “geth Account new” will not be added to the status dictionary tree. If a successful transaction (a transaction that consumes Ethereum fuel and is added to a mined block) is associated with the account, then the account will appear in the state dictionary tree.
This prevents malicious attackers from constantly creating new accounts, thus maintaining the normal amount of data in the state dictionary tree.
4.2 Decoding Data
Ethereum extends the dictionary tree data structure by using ** “Modified Merkle Patricia Trie” when interacting with the levelDB database.
For example, the improved Merkle Patricia includes a method for fast traversal by using the ** “extension” node **.
In Ethereum, an improved Merkle Patricia Trie node could be:
-
An empty string (NULL)
-
An array of 17 items (branches)
-
An array of 2 items (leaf nodes)
-
An array of 2 items (extension)
Because Ethereum’s dictionary trees are designed and built according to strict rules, the best way to check them is to test them using computer code.
The following example uses EthereumJS and returns the balance of the stateRoot and Ethereum account address for a specific block by running the following code.
The output of the following code (the etheric lane address 0 xccc6b46fa5606826ce8c18fece6f519064e6130b account balances)
2.0 / / / / the Mozilla Public License As per HTTP: / / https://github.com/ethereumjs/ethereumjs-vm/blob/master/LICENSE / / the Requires the following packages to run as nodejs file https://gist.github.com/tpmccallum/0e58fc4ba9061a2e634b7a877e60143a
//Getting the requirements
var Trie = require('merkle-patricia-tree/secure');
var levelup = require('levelup');
var leveldown = require('leveldown');
var utils = require('ethereumjs-util');
var BN = utils.BN;
var Account = require('ethereumjs-account');
//Connecting to the leveldb database
var db = levelup(leveldown('/home/timothymccallum/gethDataDir/geth/chaindata'));
//Adding the "stateRoot" value from the block so that we can inspect the state root at that block height.
var root = '0x9369577baeb7c4e971ebe76f5d5daddba44c2aa42193248245cf686d20a73028';
//Creating a trie object of the merkle-patricia-tree library
var trie = new Trie(db, root);
var address = '0xccc6b46fa5606826ce8c18fece6f519064e6130b';
trie.get(address, function (err, raw) {
if (err) return cb(err)
//Using ethereumjs-account to create an instance of an account
var account = new Account(raw)
console.log('Account Address: ' + address);
//Using ethereumjs-util to decode and present the account balance
console.log('Balance: ' + (new BN(account.balance)).toString());
})
Copy the code
5
What are the advantages of the unique design?
5.1 mobility
Mobile devices and Internet of Things (IoT) devices are everywhere today, and the future of e-commerce is built on secure, powerful and fast mobile applications.
It can be said that blockchain has made great progress in mobility, but we must also admit that the increasing size of blockchain is inevitable. Storing the entire blockchain on everyday mobile devices is therefore impractical.
5.2 Fast speed, no impact on safety
The ethereum network state design and its use of the improved Merkle Patricia dictionary tree opens up more possibilities for its application.
Every operation (add, update, or delete) performed on the dictionary tree in Ethereum uses deterministic cryptographic hashes.
In addition, the cryptographic hash value of the dictionary root node can be used as evidence that the dictionary tree has not been tampered with. For example, any changes to the dictionary tree data (such as increasing the account balance in the levelDB database) will completely change the root node hash.
This cryptographic feature opens up the possibility of fast, reliable queries for light clients (devices that don’t store the entire blockchain), such as querying an account “0x… Does 4857 have sufficient funds to complete the transaction on the block height of “5044866”?
“Merkle’s proof of spatial complexity is logarithmic to the amount of data stored. This means that even if the entire tree of state dictionaries is several megabytes in size, if a node receives a state from a trusted source, the node only needs to download a few gigabytes of proof data to fully determine the validity of any information in the tree.”
5.3 Limit
** In the Ethereum white paper, there is a concept of a checking savings account. ** In this scenario, two users (perhaps a husband and wife, or a business partner) can each withdraw a maximum of 1% of the total account balance per day.
Although the idea is only mentioned in the “Further Directions” section of the white paper, it will undoubtedly be of great interest as it could theoretically be used as part of the underlying ethereum protocol (rather than as a layer 2 protocol or as part of a third-party wallet).
UTXO is invisible to blockchain data, and the Bitcoin blockchain does not actually store a user’s account balance. As a result, bitcoin’s underlying protocol is unlikely to implement any kind of daily limit.
5.4 Consumer confidence
We believe that with the continuous efforts of blockchain developers, we will witness the rapid development of lightweight clients and the mass deployment of secure, powerful and fast mobile applications that can interact with blockchain technology.
In the field of e-commerce, the implementation of blockchain technology must improve speed, security and availability. Smart design provides excellent usability, safety and performance that will improve consumer confidence and increase adoption by the public.
The data storage mechanism is also a major problem facing the implementation of blockchain applications, which determines the operation efficiency of blockchain.
Only by solving the pain points related to the implementation of blockchain applications can blockchain truly enter people’s lives and bring convenience to people!
Source: Blockchain Base camp
Wechat ID: Blockchain_camp
The author | Vasa entrepreneurs, TowardsBlockChain co-founder
Compile | ms kou, Guoxi