Ethash consensus algorithm for source code analysis of Ethereum
Branch code: github.com/ethereum/go…
The introduction
There are currently two consensus algorithm implementations in Ethereum: Clique and ethash. Ethash is the current POW consensus algorithm for the Ethereum mainnet (Homestead version).
The directory structure
The ethash module is located in the Consensus /ethash directory under the Ethereum project directory.
- algorithm.goTo achieve the
Dagger-Hashimoto
All the functions of the algorithm, such as generationcache
anddataset
And according to theHeader
andNonce
Calculate mining hash, etc. - api.goImplemented for
RPC
The use ofapi
Methods. - consensus.goPart of the method to implement the ethereum consensus interface, including
Verify
Series method (VerifyHeader
,VerifySeal
Etc.),Prepare
andFinalize
,CalcDifficulty
,Author
,SealHash
. - ethash.goTo achieve the
cache
The structure anddataset
Structures and their respective methods,MakeCache
/MakeDataset
The function,Ethash
The object’sNew
Function, andEthash
Internal methods. - sealer.goThat implements the consensus interface
Seal
Methods, andEthash
Internal method ofmine
. These methods are implementedethash
The mining function.
Ethash design principles
Ethash design goals
When designing consensus algorithm with Taifang, it is expected to achieve three objectives:
- Resistance to
ASIC
Sex: The advantage of creating dedicated hardware for the algorithm should be as small as possible so that the average computer user can use the CPU for mining.- Resist (with memory limits)
ASIC
Expensive to use miner memory) - When large amounts of random memory data are read, the computational speed is limited not only by the cell, but also by the read speed of the memory.
- Resist (with memory limits)
- Light client verifiability: A block should be verified quickly and efficiently by a light client.
- Miners should require the full blockchain state to be stored.
Hash the data set
Ethash, in order to compute the hash, needs a piece of data. The data set is large, with an initial size of about 1G, and is updated every 30,000 blocks, with each update growing by about 8M. The data source for the hash is from this data set; It’s the header data and the Nonce field that decide which data in the dataset to hash. This part is implemented by the Dagger algorithm.
Dagger
The Dagger algorithm is used to generate Dataset Dataset. The core part is the generation mode and organizational structure of Dataset.
A Dataset can be thought of as an array composed of multiple items (dataitems), and each item is a 64-byte array (a hash). The initial size of the dataset is about 1G, and it is updated every 30,000 blocks (one epoch interval), and each update is about 8M larger than the previous update.
Each item of the Dataset is generated by a cache block, which can also be regarded as composed of multiple items. The memory occupied by the cache block is much smaller than that of the Dataset, and its initial size is about 16M. Similar to the dataset, every 30,000 blocks will be updated, and each update will be about 128K larger than the previous update.
The procedure for generating a dataItem is to select a cacheItem from the cache block “randomly” (here “random” is not really a random number, but refers to the value that can’t be determined before, but is the same every time calculated) for calculation. The result will participate in the next calculation, and this process will loop 256 times.
The cache block is generated by the seed, and the value of the seed is related to the height of the block. Therefore, the process of generating the dataset is shown in the figure below:
The other key thing with Dagger is certainty. That is, within the same epoch, the seed, cache, and dataset calculated each time are the same. Otherwise, for the same block, the mining person and the verification person use different dataset, it is impossible to verify.
Hashimoto algorithm
It was created by Thaddeus Dryja. Designed to resist the miner through IO restrictions. In the process of mining, make the memory access restrictions, due to the memory device itself will be more cheaper than computing devices and common, in terms of memory upgrade optimization, big companies are all over the world are huge, so that the memory can adapt to various user scenarios, so has the concept of random access memory RAM, as a result, the existing memory may be more close to the evaluation of the optimal algorithm. The Hashimoto algorithm uses blockchain as the source data, satisfying the requirements of 1 and 3 above.
It uses the hash of the block Header and the Nonce field to generate a final hash from the dataset.
The source code parsing
Generate a hash data set
The generate function is located in the ethash.go file to generate the dataset, which contains the following contents.
To generate cache size
The epochLength is 30000. If the epoch is smaller than 2048, the corresponding cache size is returned from the known epoch. Otherwise, the epoch is recalculated
(2^24^ + 2^17^ * epoch – 64)). Divide this value by 64 to see if the result is a prime number. If not, subtract 128 and recalculate until you find the largest prime number.
csize := cacheSize(d.epoch*epochLength + 1)
Copy the code
func cacheSize(block uint64) uint64 {
epoch := int(block / epochLength)
if epoch < maxEpoch {
return cacheSizes[epoch]
}
return calcCacheSize(epoch)
}
Copy the code
func calcCacheSize(epoch int) uint64 {
size := cacheInitBytes + cacheGrowthBytes*uint64(epoch) - hashBytes
for !new(big.Int).SetUint64(size / hashBytes).ProbablyPrime(1) { // Always accurate for n < 2^64
size -= 2 * hashBytes
}
return size
}
Copy the code
Generate the dataset size
Dataset Size specifies the ethash of a specific block number to verify the cache Size, similar to that generated above
dsize := datasetSize(d.epoch*epochLength + 1)
Copy the code
func datasetSize(block uint64) uint64 {
epoch := int(block / epochLength)
if epoch < maxEpoch {
return datasetSizes[epoch]
}
return calcDatasetSize(epoch)
}
Copy the code
Seed generation
SeedHash is the seed used to generate validation caches and mining data sets. * The length is 32.
seed := seedHash(d.epoch*epochLength + 1)
Copy the code
func seedHash(block uint64) []byte {
seed := make([]byte.32)
if block < epochLength {
return seed
}
keccak256 := makeHasher(sha3.NewLegacyKeccak256())
for i := 0; i < int(block/epochLength); i++ {
keccak256(seed, seed)
}
return seed
}
Copy the code
To generate cache
generateCache(cache, d.epoch, seed)
Copy the code
The key code for generateCache is analyzed:
Take a look at hashBytes, which are used in the following calculations and have a value of 64, equivalent to the length of a KECCak512 hash, and are called [hashBytes] bytes as items.
① : Initialize the cache
This loop is used to initialize the cache: the first item in the cache is filled with the hash of the seed, and then the later item is filled with the hash of the previous item.
for offset := uint64(hashBytes); offset < size; offset += hashBytes {
keccak512(cache[offset:], cache[offset-hashBytes:offset])
atomic.AddUint32(&progress, 1)}Copy the code
② : Perform xOR based on rules for data in the cache
For each item (srcOff), select an item (xorOff) “randomly” for xOR operation; Write the hash of the result to dstOff. This logic is going to cacheRounds times.
Two caveats:
- One is that
srcOff
From the tail to the head, anddstOff
It goes from head to tail. And the two are corresponding to each other, which is whensrcOff
Represents the x item from the bottom,dstOff
The x th item is positive. - The second is
xorOff
The selection of. Notice that we just put “random” in quotes.xorOff
The value of omega looks random because omega is givenseed
Before, you didn’t know what xorOff was; But onceseed
The value of theta is determined. So every timexorOff
The values of are all determined. The seed value is determined by the height of the block. This is the same thingepoch
I always get the same insidecache
Reasons for data.
for i := 0; i < cacheRounds; i++ {
for j := 0; j < rows; j++ {
var (
srcOff = ((j - 1 + rows) % rows) * hashBytes
dstOff = j * hashBytes
xorOff = (binary.LittleEndian.Uint32(cache[dstOff:]) % uint32(rows)) * hashBytes
)
bitutil.XORBytes(temp, cache[srcOff:srcOff+hashBytes], cache[xorOff:xorOff+hashBytes])
keccak512(cache[dstOff:], temp)
atomic.AddUint32(&progress, 1)}}Copy the code
Generate the dataset
The calculation of the size of the dataset is similar to that of the cache, but the magnitude is different: 2^30^ + 2^23^ * epoch-128, and then subtract 256 each time to find the largest prime number.
Generating data is a loop, 64 bytes ata time. The main function is generateDatasetItem:
The data source of generateDatasetItem is cache data, and the final dataset value is stored in the MIX variable. The whole process is also made up of multiple cycles.
① : Initialize the mix variable
Initialize the mix variable based on the cache value. HashWords indicates the number of word values in a hash. The length of a hash is hashBytes, which is 64 bytes. The length of a word (uint32) is 4 bytes, so the value of hashWords is 16. The selection of a cache item is determined by the index and I variables.
mix := make([]byte, hashBytes)
binary.LittleEndian.PutUint32(mix, cache[(index%rows)*hashWords]^index)
for i := 1; i < hashWords; i++ {
binary.LittleEndian.PutUint32(mix[i*4:], cache[(index%rows)*hashWords+uint32(i)])
}
keccak512(mix, mix)
Copy the code
② : Converts mix to []uint32
intMix := make([]uint32, hashWords)
for i := 0; i < len(intMix); i++ {
intMix[i] = binary.LittleEndian.Uint32(mix[i*4:)}Copy the code
③ : Aggregate cache data into intmix
for i := uint32(0); i < datasetParents; i++ {
parent := fnv(index^i, intMix[i%16]) % rows
fnvHash(intMix, cache[parent*hashWords:])
}
Copy the code
The FNV hash algorithm is a hash algorithm that does not require a key.
The algorithm is simple: multiply a by FNV prime 0x01000193, and then xor with B.
First, use this algorithm to calculate an index value, use this index to select a value (data) from cache, and then calculate FNV for each byte in mix to get the final hash value.
func fnv(a, b uint32) uint32 {
return a*0x01000193 ^ b
}
func fnvHash(mix []uint32, data []uint32) {
for i := 0; i < len(mix); i++ {
mix[i] = mix[i]*0x01000193 ^ data[i]
}
}
Copy the code
Restore intMix to mix and calculate the hash of mix
for i, val := range intMix {
binary.LittleEndian.PutUint32(mix[i*4:], val)
}
keccak512(mix, mix)
return mix
Copy the code
GenerateCache and generateDataset are the core functions to realize the Dagger algorithm. This is the end of the whole process of generating hash data set.
Consensus engine core functions
The code is located at consen.go
(1) : the Author
// Return to Coinbase, which is the address of the miner who packaged the first transaction
func (ethash *Ethash) Author(header *types.Header) (common.Address, error) {
return header.Coinbase, nil
}
Copy the code
(2) : VerifyHeader
The first step is to check whether the header has a known or unknown ancestor. The second step is to check ethash:
2.1 Header. Extra cannot exceed 32 bytes
if uint64(len(header.Extra)) > params.MaximumExtraDataSize { // No more than 32 bytes
return fmt.Errorf("extra-data too long: %d > %d".len(header.Extra), params.MaximumExtraDataSize)
}
Copy the code
2.2 The timestamp cannot be longer than 15 seconds. Any block after 15 seconds is considered a future block
if! uncle {if header.Time > uint64(time.Now().Add(allowedFutureBlockTime).Unix()) {
return consensus.ErrFutureBlock
}
}
Copy the code
2.3 The timestamp of the current header is smaller than that of the parent block
if header.Time <= parent.Time { // The current header time is less than or equal to that of the parent block
return errZeroBlockTime
}
Copy the code
2.4 Verify the difficulty of the block based on the time stamp and the difficulty of the parent block
expected := ethash.CalcDifficulty(chain, header.Time, parent)
ifexpected.Cmp(header.Difficulty) ! =0 {
return fmt.Errorf("invalid difficulty: have %v, want %v", header.Difficulty, expected)
}
Copy the code
2.5 Verify that gas limit is less than 2^63^ -1
cap: =uint64(0x7fffffffffffffff)
if header.GasLimit > cap {
return fmt.Errorf("invalid gasLimit: have %v, max %v", header.GasLimit, cap)}Copy the code
2.6 Confirm that gasUsed is <= gasLimit
if header.GasUsed > header.GasLimit {
return fmt.Errorf("invalid gasUsed: have %d, gasLimit %d", header.GasUsed, header.GasLimit)
}
Copy the code
2.7 Verify that the block number is the parent block plus 1
if diff := new(big.Int).Sub(header.Number, parent.Number); diff.Cmp(big.NewInt(1)) != 0 {
return consensus.ErrInvalidNumber
}
Copy the code
2.8 Check whether the given block meets the POW difficulty requirement
if seal {
iferr := ethash.VerifySeal(chain, header); err ! =nil {
return err
}
}
Copy the code
(3) : VerifyUncles
3.1 Uncle block at most two
if len(block.Uncles()) > maxUncles {
return errTooManyUncles
}
Copy the code
3.2 Collect uncle block and ancestor block
number, parent := block.NumberU64()- 1, block.ParentHash()
for i := 0; i < 7; i++ {
ancestor := chain.GetBlock(parent, number)
if ancestor == nil {
break
}
ancestors[ancestor.Hash()] = ancestor.Header()
for _, uncle := range ancestor.Uncles() {
uncles.Add(uncle.Hash())
}
parent, number = ancestor.ParentHash(), number- 1
}
ancestors[block.Hash()] = block.Header()
uncles.Add(block.Hash())
Copy the code
3.3 Make sure that the tertiary block is only awarded once and that the tertiary block has a valid ancestor
for _, uncle := range block.Uncles() {
// Make sure every uncle is rewarded only once
hash := uncle.Hash()
if uncles.Contains(hash) {
return errDuplicateUncle
}
uncles.Add(hash)
// Make sure the uncle has a valid ancestry
ifancestors[hash] ! =nil {
return errUncleIsAncestor
}
if ancestors[uncle.ParentHash] == nil || uncle.ParentHash == block.ParentHash() {
return errDanglingUncle
}
if err := ethash.verifyHeader(chain, uncle, ancestors[uncle.ParentHash], true.true); err ! =nil {
return err
}
Copy the code
(4) : Prepare
Initialize the Header’s Difficulty field
parent := chain.GetHeader(header.ParentHash, header.Number.Uint64()- 1)
if parent == nil {
return consensus.ErrUnknownAncestor
}
header.Difficulty = ethash.CalcDifficulty(chain, header.Time, parent)
return nil
Copy the code
⑤ : Finalize performs all post-transaction state modifications (e.g., block rewards) but does not assemble the block.
5.1 Accumulate any block and tertiary block rewards
accumulateRewards(chain.Config(), state, header, uncles)
Copy the code
5.2 Calculate the root hash of the state tree and submit it to the header
header.Root = state.IntermediateRoot(chain.Config().IsEIP158(header.Number))
Copy the code
⑥ : FinalizeAndAssemble Runs any post-transaction state changes (for example, block rewards) and assembles the final block.
func (ethash *Ethash) FinalizeAndAssemble(chain consensus.ChainReader, header *types.Header, state *state.StateDB, txs []*types.Transaction, uncles []*types.Header, receipts []*types.Receipt) (*types.Block, error) {
accumulateRewards(chain.Config(), state, header, uncles)
header.Root = state.IntermediateRoot(chain.Config().IsEIP158(header.Number))
return types.NewBlock(header, txs, uncles, receipts), nil
}
Copy the code
Obviously, there are more types. Newblocks than Finalize
⑦ : SealHash returns the hash of the block before seal (different from the hash of the block after seal)
func (ethash *Ethash) SealHash(header *types.Header) (hash common.Hash) {
hasher := sha3.NewLegacyKeccak256()
rlp.Encode(hasher, []interface{}{
header.ParentHash,
header.UncleHash,
header.Coinbase,
header.Root,
header.TxHash,
header.ReceiptHash,
header.Bloom,
header.Difficulty,
header.Number,
header.GasLimit,
header.GasUsed,
header.Time,
header.Extra,
})
hasher.Sum(hash[:0])
return hash
}
Copy the code
⑧ : Seal The given input block generates a new Seal request (mining) and pushes the result to the given channel.
Note that this method returns immediately and sends the result asynchronously. Depending on the consensus algorithm, multiple results may also be returned. This part will be analyzed in detail in the following mining, which will be skipped here.
Dig the details
If you have any questions while reading this article, please leave me a message and I will reply in time. If you think your writing is good, you can follow the Github project at the bottom of the reference, and you can follow the author’s article for the first time.
Core interface definition of mining:
Seal(chain ChainReader, block *types.Block, results chan<- *types.Block, stop <-chan struct{}) error
Copy the code
Enter the seal function:
If you run an incorrect POW, return empty nonce and MixDigest, and the block is also empty.
if ethash.config.PowMode == ModeFake || ethash.config.PowMode == ModeFullFake {
header := block.Header()
header.Nonce, header.MixDigest = types.BlockNonce{}, common.Hash{}
select {
case results <- block.WithSeal(header):
default:
ethash.config.Log.Warn("Sealing result is not read by miner"."mode"."fake"."sealhash", ethash.SealHash(block.Header()))
}
return nil
}
Copy the code
② : If the POW is shared, it goes to its shared object to perform the Seal operation
ifethash.shared ! =nil {
return ethash.shared.Seal(chain, block, results, stop)
}
Copy the code
③ : Obtain the seed source and generate the seed required by ethash according to it
f ethash.rand == nil {
// Get the seed
seed, err := crand.Int(crand.Reader, big.NewInt(math.MaxInt64))
iferr ! =nil {
ethash.lock.Unlock()
return err
}
ethash.rand = rand.New(rand.NewSource(seed.Int64())) // Assign a value to rand
}
Copy the code
④ : The core work of mining is handed over to mine
for i := 0; i < threads; i++ {
pend.Add(1)
go func(id int, nonce uint64) {
defer pend.Done()
ethash.mine(block, id, nonce, abort, locals) // Perform the actual mining action
}(i, uint64(ethash.rand.Int63()))
}
Copy the code
⑤ : Processing the results of mining
- An unexpected external abort stops all mining threads
- One thread digs the correct block, aborting all other threads
- The ethash object changes. Stop all current operations and restart the current method
go func(a) {
var result *types.Block
select {
case <-stop:
close(abort)
case result = <-locals:
select {
case results <- result: // If one thread digs the correct block, terminate all other threads
default:
ethash.config.Log.Warn("Sealing result is not read by miner"."mode"."local"."sealhash", ethash.SealHash(block.Header()))
}
close(abort)
case <-ethash.update:
close(abort)
iferr := ethash.Seal(chain, block, results, stop); err ! =nil {
ethash.config.Log.Error("Failed to restart sealing after update"."err", err)
}
}
Copy the code
As you can see from the above, the core work of SEAL is done by the mine function.
The mine function, which is actually a real POW miner, searches for a nonce value that starts with the seed value, which is the block difficulty that eventually produces the correct matching and verifiable block difficulty
① : Extract relevant data from the block header and put it in the global variable domain
var (
header = block.Header()
hash = ethash.SealHash(header).Bytes()
target = new(big.Int).Div(two256, header.Difficulty) // This is the target used for validation
number = header.Number.Uint64()
dataset = ethash.dataset(number, false))Copy the code
② : Start generating random Nonce until we abort or find a good nonce
var (
attempts = int64(0)
nonce = seed
)
Copy the code
③ : Gather the complete dataset data and generate the final hash value for the specific header and nonce
func hashimotoFull(dataset []uint32, hash []byte, nonce uint64) ([]byteAnd []byte) {
// Define a lookup function to find data in the data set
lookup := func(index uint32) []uint32 {
offset := index * hashWords //hashWords is the constant value defined above = 16
return dataset[offset : offset+hashWords]
}
return hashimoto(hash, nonce, uint64(len(dataset))*4, lookup)
}
Copy the code
You can see that what the hashimotoFull function does is actually read and slice up the original data set and pass it to the Hashimoto function. Next, focus on analyzing hashimoto functions:
3.1 Get the header based on the SEED
rows := uint32(size/mixBytes) ① seed :=make([]byte.40) 2.copy(seed, hash) (3) binary. LittleEndian. PutUint64 (seed [32:], nonce). (4) seed = crypto Keccak512 (seed) (5) seedHead: = binary. LittleEndian. Uint32 (seed) 6Copy the code
- Calculates the number of rows in the dataset
- merge
header+nonce
To a 40-byteseed
- Will block header
hash
Copy toseed
In the - will
nonce
Value to fill inseed
After (40-32=8) bytes, (nonce itself isuint64
Type, which is 64 bits, corresponds to 8 bytes size), just puthash
andnonce
The seed is fully filled with 40 bytes Keccak512
encryptionseed
- from
seed
Gets the header in the
3.2 Start mixing from the copied seed
mixBytes
Constant = 128,mix
The length of is 32 and the element isuint32
, is 32 bits, corresponding to the size of 4 bytes. somix
The total size is 4*32=128 bytes
mix := make([]uint32, mixBytes/4)
for i := 0; i < len(mix); i++ {
mix[i] = binary.LittleEndian.Uint32(seed[i%16*4:)}Copy the code
3.3 Mixed random dataset nodes
temp := make([]uint32.len(mix))// It has the same structure and length as mix
for i := 0; i < loopAccesses; i++ {
parent := fnv(uint32(i)^seedHead, mix[i%len(mix)]) % rows
for j := uint32(0); j < mixBytes/hashBytes; j++ {
copy(temp[j*hashWords:], lookup(2*parent+j))
}
fnvHash(mix, temp)
}
Copy the code
3.4 Compression mixing
for i := 0; i < len(mix); i += 4 {
mix[i/4] = fnv(fnv(fnv(mix[i], mix[i+1]), mix[i+2]), mix[i+3])
}
mix = mix[:len(mix)/4]
digest := make([]byte, common.HashLength)
for i, val := range mix {
binary.LittleEndian.PutUint32(digest[i*4:], val)
}
return digest, crypto.Keccak256(append(seed, digest...) )Copy the code
What comes back is the Digest and the hash of the Digest and the seed; Digest is the []byte form of mix. In the previous ethash.mine code we have seen that the second return value is compared to the target variable to determine whether it is a valid hash.
Verify the pow
The verification of mining information has two parts:
- validation
Header.Difficulty
Whether it is right - validation
Header.MixDigest
andHeader.Nonce
Whether it is right
(1) Verify the header. Difficulty code is mainly in ethash. verifyHeader:
func (ethash *Ethash) verifyHeader(chain consensus.ChainReader, header, parent *types.Header, uncle bool, seal bool) error{... expected := ethash.CalcDifficulty(chain, header.Time.Uint64(), parent)ifexpected.Cmp(header.Difficulty) ! =0 {
return fmt.Errorf("invalid difficulty: have %v, want %v", header.Difficulty, expected)
}
}
Copy the code
The block height and time difference are used as parameters to calculate the Difficulty value, which is then compared to the header. Difficulty field of the block to be verified. If it is equal, it is considered correct.
(2) : Authentication of MixDigest and Nonce is mainly in header. verifySeal:
Validation: The MixDigest and result hashes are recalculated in Hashimoto using header. Nonce and Header hashes, and the verified node does not require dataset data.
Summary & Reference
Mindcarver. Cn fostered fostered fostered
Github.com/blockchainG… Do do do
The eth. Wiki/concepts/et…
The eth. Wiki/concepts/et…
www.vijaypradeep.com/blog/2017-0…