This series is aimed at blockchain developers and CS students with other development experience
In the first part of this paper, we started from the transaction model, aiming at the working principle of the lock script and unlock script, discussed the design logic of the bitcoin script engine, and analyzed the two important mechanisms of P2PKH and P2SH. In contrast to Bitcoin, Ethereum aims to build a Turing-complete, programmable blockchain platform. What has it done differently to that end?
Ethereum virtual machine
Blockchain paradigm
Gavin Wood abstracts the blockchain system as a transaction-based state machine in his Yellow Book:
In Formula (1), S is the set of states inside the system, F is the transfer function of transaction state, T is the transaction information, and the initial state is Gensis state. In Formula (2), F is the block-level state transfer function, and B is the block information. Formula (3) defines B as a block of a series of transactions, each containing multiple transactions; Formula (4)G is a block finalization function, including uncle block check, bonus miner check, POW check, etc.
This mathematical model underlies not only Ethereum, but also most consensus-based decentralized trading systems today. Ethereum’s ascent over Bitcoin is essentially the F and S of this paradigm. At its heart is the idea of a blockchain with Turing-complete and unlimited storage space for internal transactions. Corresponding to:
- The powerful function F, which can perform any calculation, bitcoin does not support loop;
- State S records any type of data (including code), while Bitcoin’s UTXO model only calculates how much an address can be spent.
The data structure
In terms of data storage, Bitcoin calculates address balances through the UTXO model, and users are not encouraged to store other data. Through the P2SH scripting mechanism, it is theoretically possible to design various smart contracts, but due to the limited expression ability of scripting language, it is difficult to support complex contract development. This design makes sense for cryptocurrencies.
Ethereum needed to redesign its data structure to allow recording arbitrary information and performing arbitrary functions.
Merkle Patricia Trie
Ethereum uses Merkle Patricia Trie heavily to organize and store data, and as we’ll see below, this new data structure is achieved through an innovative combination of hash and prefix trees. Convention: MPT is used below in place of Merkle Patricia Trie.
Merkle Tree
Also called a hash tree: Each leaf of the tree is the hash of a block of data, and each non-leaf is the hash of a child. As shown, this tree does not store Data Blocks themselves. In the P2P network environment, if the malicious network node modifies the data on the tree, it will not pass Merkle Proof, thus ensuring the integrity and validity of the data. This depends on the nature of one-way hash encryption. This property makes it widely used in distributed system data verification, such as IPFS, Git, etc.
Patricia Trie
Also called Radix Trie, it is a spatially optimized variant of prefix tree: if a node in the tree is the only child of its parent, the two nodes can be merged. Its application here is to map long integer data from a 20bytes Ethereum Address to its Account, such as
, which is encoded into a hexadecimal number — in Patricia Trie’s case, a path of non-leaf nodes.,account>
For example, if you store <“dog”,”Snoopy”> on Patricia Trie,” dog” will be encoded as “64 6f 67”. If you find the root node, you will find root->6->4->6->15->6->7->value. Value is just a hash that points to “Snoopy”. The advantage of this approach over hash tables is that there are no conflicts; But without optimization, the query steps are too long.
Improved point
To improve efficiency, Ethereum has a special design for node data types in the tree. Including the following four types of nodes
- The null node represents an empty string
- Branch a 17-element non-leaf node of the form
,i1…> - Leaf node A leaf node of two elements, such as
,value>
. EncodedPath is part of a long integer number string encoded by address encryption
- Extension node is a non-leaf node of two elements, such as <encodedPath, K >. The function of extension is to combine nodes on paths without forks to save space resources
The figure, which is a simplified state tree (explained in more detail shortly, but not in any way a schematic), shows a map of < address, balance > in the upper right corner. The prefix item is used to assist coding and can be ignored. Addresses of 4 accounts, organized according to MPT. All extension nodes are just optimizations and can be replaced with multiple Branch nodes.
MPT requires a back-end database (levelDB in Ethereum) to maintain the connection between each node. This database is called the state database. The benefits of using MPT include: (1) The root node of this structure is encrypted and depends on all internal data. Its hash can be used for security verification, which is the property of Merkle tree. However, unlike Merkle tree, which does not store data blocks itself, MPT tree nodes store address data. It is the property of the Patricia tree (2) that allows any previous state (where the root hash is known) to be recalled by simply changing the root hash value.
state
The concept of a state tree was introduced above in explaining MPT. The concept of World states in Ethereum stores arbitrary states recorded by a decentralized transaction system through MPT mappings. This corresponds to the S in the blockchain paradigm, a core concept of Ethereum’s design.
The article
The entry point for querying the latest account status should be the state tree of the most recently confirmed block.
Ethereum’s account model needs a special introduction.
Account
Bitcoin uses the UTXO model to calculate balances, which cannot meet the need to record arbitrary states. Ethereum has designed the Account model, which stores the following: [Nonce, balance, storageRoot, codeHash] where Nonce is the transaction counter, balance is the balance information, storageRoot corresponds to another MPT, through which the variable information of the contract can be retrieved in the database. A codeHash is a codeHash value that cannot be changed after creation.
- External accounts are controlled by private keys. In the corresponding Account model, the storageRoot and codeHash do not exist, that is, codes are not stored or executed. If there are only external accounts, then Ethereum can only support money transfers.
- Contract accounts can be created by initiating transactions from an external account or by another contract account. When a contract account receives a message invocation, it loads code that performs the logic through EVM to modify the state of the internal store.
trading
Under the UTXO model, transactions are essentially unlocking input (via signed data) and locking output. Under the Account model, there are two types of transactions:
- Create a contract, creating a new contract through code
- Message call, which can transfer money or trigger a function of the contract
There are two types of transactions are includes the following fields: [nonce, gasPrice gasLimit, to value, [v, r, s]]
- Nonce: indicates the number of transactions sent by an account
- GasPrice,gasLimit: Used to limit the execution time of a transaction and prevent the program from running in an infinite loop
- To: The recipient of the transaction
- Value: Amount of transfer, in the case of creating a contract, the amount donated to the contract
- V, R, S: transaction signature data that can be used to determine the sender of a transaction
Contract creation also requires:
- Init: AN EVM code represented by an unlimited byte array that is run only once at contract creation time; Init returns the body snippet after execution, and subsequent contract calls run the body content.
The address of the contract account is determined jointly by the Sender and the Nonce, so any two successful contract deployments will result in different addresses. As you can see from the figure above, code and state are stored separately. The compiled bytecode is actually stored in a virtual ROM and cannot be modified.
Message invocation also requires:
- Data: An unlimited array of bytes representing the input to a message call
Message invocation modifies the status of an account, which could be an EOA account or a contract account.
Transactions can be initiated from an external account or by contract. For example, block 5228886 contains 170 transactions and seven internal contract transactions.
block
Ethereum’s blocks add more data items, making them much more complex than Bitcoin, but not much different in nature. For example, tertiary hashing is added to optimize incentives to support mining agreements; There is also a lot of validation and serialization in the block itself. These are beyond the scope of this article and will not be discussed in depth.
EVM design and implementation
The Ethereum Virtual Machine (Ethereum Virtual Machine) is the runtime environment that performs Ethereum’s state transition function. Simple question: Can Ethereum reuse Java, Lisp, Lua, etc instead of developing a single underlying VM? In theory, yes, the Corda project is based entirely on the JVM platform. But more blockchain projects will choose to specialize in developing underlying infrastructure, including scripting engines for Bitcoin. Ethereum official explanation:
- Ethereum’s VM specifications are simpler, whereas other general-purpose VMS have a lot of unnecessary complexity
- Allow custom development, such as 32bytes of words
- Avoid external dependency issues with other VMS, which can make installation difficult
- With another VM, a complete security review is required, and the tradeoff may not be much
The memory model
EVM is also based on the stack computer model, but involves memory and storage in addition to the stack:
- Stack the size of elements on the stack is 32bytes, which is different from the normal size of 4bytes and 8bytes, mainly for ethereum arithmetic objects with 20bytes of address and 32bytes of cryptography variables. The stack size does not exceed 1024; The call depth of the stack does not exceed 1024 to prevent memory overflow.
- While all operations are performed on the stack, temporary variables can be stored in memory, regardless of memory size
- Storage state variables are stored in storage. Unlike stack and memory variables that disappear when EVM instances are destroyed, data in storage will be persisted after modification
This is a diagram of the EVM architecture, which has a profound impact on ethereum application development, including design patterns and security considerations.A classic problem is contract escalation:
After the contract is deployed, it is compiled into bytecode and stored in virtual ROM. The code is not modifiable, which is a serious constraint for many DApps. One idea is to distribute the code among different contracts, and the inter-contract calls are made through addresses stored in storage, thus implementing the actual contract upgrade operation.
Execution model
EVM is technically a quasi-Turing machine, grammatically capable of performing arbitrary operations, but in order to prevent network abuse and avoid security issues due to Turing integrity, all operations in Ethereum are subject to economic restrictions, known as gas mechanisms, in three cases:
- General operation cost, such as SLOAD,SSTORE, etc
- Fuel consumed by submessage calls or contract creation is part of the cost of performing CREATE, CALL, and CALLCODE
- The cost of memory usage is proportional to the number of 32bytes words required
The following figure shows the internal flow of EVM execution, fetching instructions from EVM code, all operations on the Stack, Memory as a temporary variable storage, storage is the account state. Execution is limited by Gas Avail.
Now with EVM, let’s look at the execution details of the transaction introduced earlier. As defined by the blockchain paradigm, T is the ethereum state transition function, which is also the most complex part of Ethereum. All transactions are subject to internal validation before execution:
- Transactions are RLP format data with no extra suffix bytes;
- The signature of the transaction is valid;
- Random numbers of transactions are valid;
- The fuel ceiling shall not be less than the fuel used in the actual transaction process;
- The balance of the sender’s account is at least greater than the fee V0, which needs to be paid in advance;
The following figure shows the process of message invocation. Each transaction may form a deep call stack, with calls between different contracts within the transaction. The CALL is made through the CALL directive, and the parameters and return values are passed through memory.
Error handling
Several errors occur during EVM contract execution:
- The fuel shortage
- Invalid instructions
- Missing stack data
- The target address of the JUMP JUMPI command is invalid
- The new stack size is greater than 1024
- Stack call depth exceeds 1024
EVM error handling has a simple principle called revert-state-and-consume all-gas, where the state is restored to the checkpoint before the transaction was executed, but consumed gas is not returned. The virtual machine treats errors as code errors and does no specific error handling.
EVM analysis tools
Tools for EVM analysis can be found in the Ethereum Virtual Machine (EVM) Awesome List
Turing-complete Virtual Machine (WIP) like EVM
The full EVM specification is complex, but with a certain assembly base and the ability to simplify the model, implementing an EVM-like virtual machine is a challenge that can be attempted. I’ll put up my own implementation when I have time. If you’re interested, you can try it yourself.
reference
1.A Next-Generation Smart Contract and Decentralized Application Platform
2.ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER
3.Design Rationale
4.Stack Exchange: Ethereum block architecture
5.Go Ethereum
6.evm-illustrated
7.Diving Into The Ethereum VM