Ethash consensus algorithm for source code analysis of Ethereum

Branch code: github.com/ethereum/go…

The introduction

There are currently two consensus algorithm implementations in Ethereum: Clique and ethash. Ethash is the current POW consensus algorithm for the Ethereum mainnet (Homestead version).

The directory structure

The ethash module is located in the Consensus /ethash directory under the Ethereum project directory.

  • algorithm.goTo achieve theDagger-HashimotoAll the functions of the algorithm, such as generationcacheanddatasetAnd according to theHeaderandNonceCalculate mining hash, etc.
  • api.goImplemented forRPCThe use ofapiMethods.
  • consensus.goPart of the method to implement the ethereum consensus interface, includingVerifySeries method (VerifyHeader,VerifySealEtc.),PrepareandFinalize,CalcDifficulty,Author,SealHash.
  • ethash.goTo achieve thecacheThe structure anddatasetStructures and their respective methods,MakeCache/MakeDatasetThe function,EthashThe object’sNewFunction, andEthashInternal methods.
  • sealer.goThat implements the consensus interfaceSealMethods, andEthashInternal method ofmine. These methods are implementedethashThe mining function.

Ethash design principles

Ethash design goals

When designing consensus algorithm with Taifang, it is expected to achieve three objectives:

  1. Resistance toASICSex: The advantage of creating dedicated hardware for the algorithm should be as small as possible so that the average computer user can use the CPU for mining.
    • Resist (with memory limits)ASICExpensive to use miner memory)
    • When large amounts of random memory data are read, the computational speed is limited not only by the cell, but also by the read speed of the memory.
  2. Light client verifiability: A block should be verified quickly and efficiently by a light client.
  3. Miners should require the full blockchain state to be stored.

Hash the data set

Ethash, in order to compute the hash, needs a piece of data. The data set is large, with an initial size of about 1G, and is updated every 30,000 blocks, with each update growing by about 8M. The data source for the hash is from this data set; It’s the header data and the Nonce field that decide which data in the dataset to hash. This part is implemented by the Dagger algorithm.

Dagger

The Dagger algorithm is used to generate Dataset Dataset. The core part is the generation mode and organizational structure of Dataset.

A Dataset can be thought of as an array composed of multiple items (dataitems), and each item is a 64-byte array (a hash). The initial size of the dataset is about 1G, and it is updated every 30,000 blocks (one epoch interval), and each update is about 8M larger than the previous update.

Each item of the Dataset is generated by a cache block, which can also be regarded as composed of multiple items. The memory occupied by the cache block is much smaller than that of the Dataset, and its initial size is about 16M. Similar to the dataset, every 30,000 blocks will be updated, and each update will be about 128K larger than the previous update.

The procedure for generating a dataItem is to select a cacheItem from the cache block “randomly” (here “random” is not really a random number, but refers to the value that can’t be determined before, but is the same every time calculated) for calculation. The result will participate in the next calculation, and this process will loop 256 times.

The cache block is generated by the seed, and the value of the seed is related to the height of the block. Therefore, the process of generating the dataset is shown in the figure below:

The other key thing with Dagger is certainty. That is, within the same epoch, the seed, cache, and dataset calculated each time are the same. Otherwise, for the same block, the mining person and the verification person use different dataset, it is impossible to verify.


Hashimoto algorithm

It was created by Thaddeus Dryja. Designed to resist the miner through IO restrictions. In the process of mining, make the memory access restrictions, due to the memory device itself will be more cheaper than computing devices and common, in terms of memory upgrade optimization, big companies are all over the world are huge, so that the memory can adapt to various user scenarios, so has the concept of random access memory RAM, as a result, the existing memory may be more close to the evaluation of the optimal algorithm. The Hashimoto algorithm uses blockchain as the source data, satisfying the requirements of 1 and 3 above.

It uses the hash of the block Header and the Nonce field to generate a final hash from the dataset.


The source code parsing

Generate a hash data set

The generate function is located in the ethash.go file to generate the dataset, which contains the following contents.

To generate cache size

The epochLength is 30000. If the epoch is smaller than 2048, the corresponding cache size is returned from the known epoch. Otherwise, the epoch is recalculated

(2^24^ + 2^17^ * epoch – 64)). Divide this value by 64 to see if the result is a prime number. If not, subtract 128 and recalculate until you find the largest prime number.

csize := cacheSize(d.epoch*epochLength + 1)
Copy the code
func cacheSize(block uint64) uint64 {
	epoch := int(block / epochLength)
	if epoch < maxEpoch {
		return cacheSizes[epoch]
	}
	return calcCacheSize(epoch)
}
Copy the code
func calcCacheSize(epoch int) uint64 {
	size := cacheInitBytes + cacheGrowthBytes*uint64(epoch) - hashBytes
	for !new(big.Int).SetUint64(size / hashBytes).ProbablyPrime(1) { // Always accurate for n < 2^64
		size -= 2 * hashBytes
	}
	return size
}
Copy the code

Generate the dataset size

Dataset Size specifies the ethash of a specific block number to verify the cache Size, similar to that generated above

dsize := datasetSize(d.epoch*epochLength + 1)
Copy the code
func datasetSize(block uint64) uint64 {
	epoch := int(block / epochLength)
	if epoch < maxEpoch {
		return datasetSizes[epoch]
	}
	return calcDatasetSize(epoch)
}
Copy the code

Seed generation

SeedHash is the seed used to generate validation caches and mining data sets. * The length is 32.

seed := seedHash(d.epoch*epochLength + 1)
Copy the code
func seedHash(block uint64) []byte {
	seed := make([]byte.32)
	if block < epochLength {
		return seed
	}
	keccak256 := makeHasher(sha3.NewLegacyKeccak256())
	for i := 0; i < int(block/epochLength); i++ {
		keccak256(seed, seed)
	}
	return seed
}
Copy the code

To generate cache

generateCache(cache, d.epoch, seed)
Copy the code

The key code for generateCache is analyzed:

Take a look at hashBytes, which are used in the following calculations and have a value of 64, equivalent to the length of a KECCak512 hash, and are called [hashBytes] bytes as items.

① : Initialize the cache

This loop is used to initialize the cache: the first item in the cache is filled with the hash of the seed, and then the later item is filled with the hash of the previous item.

for offset := uint64(hashBytes); offset < size; offset += hashBytes {
		keccak512(cache[offset:], cache[offset-hashBytes:offset])
		atomic.AddUint32(&progress, 1)}Copy the code

② : Perform xOR based on rules for data in the cache

For each item (srcOff), select an item (xorOff) “randomly” for xOR operation; Write the hash of the result to dstOff. This logic is going to cacheRounds times.

Two caveats:

  • One is thatsrcOffFrom the tail to the head, anddstOffIt goes from head to tail. And the two are corresponding to each other, which is whensrcOffRepresents the x item from the bottom,dstOffThe x th item is positive.
  • The second isxorOffThe selection of. Notice that we just put “random” in quotes.xorOffThe value of omega looks random because omega is givenseedBefore, you didn’t know what xorOff was; But onceseedThe value of theta is determined. So every timexorOffThe values of are all determined. The seed value is determined by the height of the block. This is the same thingepochI always get the same insidecacheReasons for data.
for i := 0; i < cacheRounds; i++ {
		for j := 0; j < rows; j++ {
			var (
				srcOff = ((j - 1 + rows) % rows) * hashBytes
				dstOff = j * hashBytes
				xorOff = (binary.LittleEndian.Uint32(cache[dstOff:]) % uint32(rows)) * hashBytes
			)
			bitutil.XORBytes(temp, cache[srcOff:srcOff+hashBytes], cache[xorOff:xorOff+hashBytes])
			keccak512(cache[dstOff:], temp)

			atomic.AddUint32(&progress, 1)}}Copy the code

Generate the dataset

The calculation of the size of the dataset is similar to that of the cache, but the magnitude is different: 2^30^ + 2^23^ * epoch-128, and then subtract 256 each time to find the largest prime number.

Generating data is a loop, 64 bytes ata time. The main function is generateDatasetItem:

The data source of generateDatasetItem is cache data, and the final dataset value is stored in the MIX variable. The whole process is also made up of multiple cycles.

① : Initialize the mix variable

Initialize the mix variable based on the cache value. HashWords indicates the number of word values in a hash. The length of a hash is hashBytes, which is 64 bytes. The length of a word (uint32) is 4 bytes, so the value of hashWords is 16. The selection of a cache item is determined by the index and I variables.

	mix := make([]byte, hashBytes)
	binary.LittleEndian.PutUint32(mix, cache[(index%rows)*hashWords]^index)
	for i := 1; i < hashWords; i++ {
		binary.LittleEndian.PutUint32(mix[i*4:], cache[(index%rows)*hashWords+uint32(i)])
	}
	keccak512(mix, mix)
Copy the code

② : Converts mix to []uint32

intMix := make([]uint32, hashWords)
	for i := 0; i < len(intMix); i++ {
		intMix[i] = binary.LittleEndian.Uint32(mix[i*4:)}Copy the code

③ : Aggregate cache data into intmix

for i := uint32(0); i < datasetParents; i++ {
		parent := fnv(index^i, intMix[i%16]) % rows
		fnvHash(intMix, cache[parent*hashWords:])
	}
Copy the code

The FNV hash algorithm is a hash algorithm that does not require a key.

The algorithm is simple: multiply a by FNV prime 0x01000193, and then xor with B.

First, use this algorithm to calculate an index value, use this index to select a value (data) from cache, and then calculate FNV for each byte in mix to get the final hash value.

func fnv(a, b uint32) uint32 {
    return a*0x01000193 ^ b
}
func fnvHash(mix []uint32, data []uint32) {
    for i := 0; i < len(mix); i++ {
        mix[i] = mix[i]*0x01000193 ^ data[i]
    }
}
Copy the code

Restore intMix to mix and calculate the hash of mix

for i, val := range intMix {
		binary.LittleEndian.PutUint32(mix[i*4:], val)
	}
	keccak512(mix, mix)
	return mix
Copy the code

GenerateCache and generateDataset are the core functions to realize the Dagger algorithm. This is the end of the whole process of generating hash data set.


Consensus engine core functions

The code is located at consen.go

(1) : the Author

// Return to Coinbase, which is the address of the miner who packaged the first transaction
func (ethash *Ethash) Author(header *types.Header) (common.Address, error) {
	return header.Coinbase, nil
}
Copy the code

(2) : VerifyHeader

The first step is to check whether the header has a known or unknown ancestor. The second step is to check ethash:

2.1 Header. Extra cannot exceed 32 bytes

if uint64(len(header.Extra)) > params.MaximumExtraDataSize {  // No more than 32 bytes
		return fmt.Errorf("extra-data too long: %d > %d".len(header.Extra), params.MaximumExtraDataSize)
	}
Copy the code

2.2 The timestamp cannot be longer than 15 seconds. Any block after 15 seconds is considered a future block

if! uncle {if header.Time > uint64(time.Now().Add(allowedFutureBlockTime).Unix()) {
			return consensus.ErrFutureBlock
		}
	}
Copy the code

2.3 The timestamp of the current header is smaller than that of the parent block

if header.Time <= parent.Time { // The current header time is less than or equal to that of the parent block
		return errZeroBlockTime
	}
Copy the code

2.4 Verify the difficulty of the block based on the time stamp and the difficulty of the parent block

expected := ethash.CalcDifficulty(chain, header.Time, parent)
	ifexpected.Cmp(header.Difficulty) ! =0 {
		return fmt.Errorf("invalid difficulty: have %v, want %v", header.Difficulty, expected)
	}
Copy the code

2.5 Verify that gas limit is less than 2^63^ -1

cap: =uint64(0x7fffffffffffffff)
	if header.GasLimit > cap {
		return fmt.Errorf("invalid gasLimit: have %v, max %v", header.GasLimit, cap)}Copy the code

2.6 Confirm that gasUsed is <= gasLimit

if header.GasUsed > header.GasLimit {
		return fmt.Errorf("invalid gasUsed: have %d, gasLimit %d", header.GasUsed, header.GasLimit)
	}
Copy the code

2.7 Verify that the block number is the parent block plus 1

if diff := new(big.Int).Sub(header.Number, parent.Number); diff.Cmp(big.NewInt(1)) != 0 {
		return consensus.ErrInvalidNumber
	}
Copy the code

2.8 Check whether the given block meets the POW difficulty requirement

if seal {
		iferr := ethash.VerifySeal(chain, header); err ! =nil {
			return err
		}
	}
Copy the code

(3) : VerifyUncles

3.1 Uncle block at most two

if len(block.Uncles()) > maxUncles {
		return errTooManyUncles
	}
Copy the code

3.2 Collect uncle block and ancestor block

number, parent := block.NumberU64()- 1, block.ParentHash()
	for i := 0; i < 7; i++ {
		ancestor := chain.GetBlock(parent, number)
		if ancestor == nil {
			break
		}
		ancestors[ancestor.Hash()] = ancestor.Header()
		for _, uncle := range ancestor.Uncles() {
			uncles.Add(uncle.Hash())
		}
		parent, number = ancestor.ParentHash(), number- 1
	}
	ancestors[block.Hash()] = block.Header()
	uncles.Add(block.Hash())

Copy the code

3.3 Make sure that the tertiary block is only awarded once and that the tertiary block has a valid ancestor

for _, uncle := range block.Uncles() {
		// Make sure every uncle is rewarded only once
		hash := uncle.Hash()
		if uncles.Contains(hash) {
			return errDuplicateUncle
		}
		uncles.Add(hash)

		// Make sure the uncle has a valid ancestry
		ifancestors[hash] ! =nil {
			return errUncleIsAncestor
		}
		if ancestors[uncle.ParentHash] == nil || uncle.ParentHash == block.ParentHash() {
			return errDanglingUncle
		}
		if err := ethash.verifyHeader(chain, uncle, ancestors[uncle.ParentHash], true.true); err ! =nil {
			return err
		}
Copy the code

(4) : Prepare

Initialize the Header’s Difficulty field

parent := chain.GetHeader(header.ParentHash, header.Number.Uint64()- 1)
	if parent == nil {
		return consensus.ErrUnknownAncestor
	}
	header.Difficulty = ethash.CalcDifficulty(chain, header.Time, parent)
	return nil
Copy the code

⑤ : Finalize performs all post-transaction state modifications (e.g., block rewards) but does not assemble the block.

5.1 Accumulate any block and tertiary block rewards

accumulateRewards(chain.Config(), state, header, uncles)
Copy the code

5.2 Calculate the root hash of the state tree and submit it to the header

header.Root = state.IntermediateRoot(chain.Config().IsEIP158(header.Number))
Copy the code

⑥ : FinalizeAndAssemble Runs any post-transaction state changes (for example, block rewards) and assembles the final block.

func (ethash *Ethash) FinalizeAndAssemble(chain consensus.ChainReader, header *types.Header, state *state.StateDB, txs []*types.Transaction, uncles []*types.Header, receipts []*types.Receipt) (*types.Block, error) {
	accumulateRewards(chain.Config(), state, header, uncles)
	header.Root = state.IntermediateRoot(chain.Config().IsEIP158(header.Number))
	return types.NewBlock(header, txs, uncles, receipts), nil
}
Copy the code

Obviously, there are more types. Newblocks than Finalize

⑦ : SealHash returns the hash of the block before seal (different from the hash of the block after seal)

func (ethash *Ethash) SealHash(header *types.Header) (hash common.Hash) {
	hasher := sha3.NewLegacyKeccak256()

	rlp.Encode(hasher, []interface{}{
		header.ParentHash,
		header.UncleHash,
		header.Coinbase,
		header.Root,
		header.TxHash,
		header.ReceiptHash,
		header.Bloom,
		header.Difficulty,
		header.Number,
		header.GasLimit,
		header.GasUsed,
		header.Time,
		header.Extra,
	})
	hasher.Sum(hash[:0])
	return hash
}
Copy the code

⑧ : Seal The given input block generates a new Seal request (mining) and pushes the result to the given channel.

Note that this method returns immediately and sends the result asynchronously. Depending on the consensus algorithm, multiple results may also be returned. This part will be analyzed in detail in the following mining, which will be skipped here.


Dig the details

If you have any questions while reading this article, please leave me a message and I will reply in time. If you think your writing is good, you can follow the Github project at the bottom of the reference, and you can follow the author’s article for the first time.

Core interface definition of mining:

Seal(chain ChainReader, block *types.Block, results chan<- *types.Block, stop <-chan struct{}) error
Copy the code

Enter the seal function:

If you run an incorrect POW, return empty nonce and MixDigest, and the block is also empty.

if ethash.config.PowMode == ModeFake || ethash.config.PowMode == ModeFullFake {
		header := block.Header()
		header.Nonce, header.MixDigest = types.BlockNonce{}, common.Hash{}
		select {
		case results <- block.WithSeal(header):
		default:
			ethash.config.Log.Warn("Sealing result is not read by miner"."mode"."fake"."sealhash", ethash.SealHash(block.Header()))
		}
		return nil
	}
Copy the code

② : If the POW is shared, it goes to its shared object to perform the Seal operation

ifethash.shared ! =nil {
		return ethash.shared.Seal(chain, block, results, stop)
	}
Copy the code

③ : Obtain the seed source and generate the seed required by ethash according to it

f ethash.rand == nil {
		// Get the seed
		seed, err := crand.Int(crand.Reader, big.NewInt(math.MaxInt64))
		iferr ! =nil {
			ethash.lock.Unlock()
			return err
		}
		ethash.rand = rand.New(rand.NewSource(seed.Int64())) // Assign a value to rand
	}
Copy the code

④ : The core work of mining is handed over to mine

for i := 0; i < threads; i++ {
		pend.Add(1)
		go func(id int, nonce uint64) {
			defer pend.Done()
			ethash.mine(block, id, nonce, abort, locals) // Perform the actual mining action
		}(i, uint64(ethash.rand.Int63()))
	}
Copy the code

⑤ : Processing the results of mining

  • An unexpected external abort stops all mining threads
  • One thread digs the correct block, aborting all other threads
  • The ethash object changes. Stop all current operations and restart the current method
go func(a) {
		var result *types.Block
		select {
		case <-stop:
			close(abort)
		case result = <-locals:
			select {
			case results <- result: // If one thread digs the correct block, terminate all other threads
			default:
				ethash.config.Log.Warn("Sealing result is not read by miner"."mode"."local"."sealhash", ethash.SealHash(block.Header()))
			}
			close(abort)
		case <-ethash.update:
			close(abort)
			iferr := ethash.Seal(chain, block, results, stop); err ! =nil {
				ethash.config.Log.Error("Failed to restart sealing after update"."err", err)
			}
		}
Copy the code

As you can see from the above, the core work of SEAL is done by the mine function.

The mine function, which is actually a real POW miner, searches for a nonce value that starts with the seed value, which is the block difficulty that eventually produces the correct matching and verifiable block difficulty

① : Extract relevant data from the block header and put it in the global variable domain

var (
		header  = block.Header()
		hash    = ethash.SealHash(header).Bytes()
		target  = new(big.Int).Div(two256, header.Difficulty) // This is the target used for validation
		number  = header.Number.Uint64()
		dataset = ethash.dataset(number, false))Copy the code

② : Start generating random Nonce until we abort or find a good nonce

var (
		attempts = int64(0)
		nonce    = seed
	)
Copy the code

③ : Gather the complete dataset data and generate the final hash value for the specific header and nonce

func hashimotoFull(dataset []uint32, hash []byte, nonce uint64) ([]byteAnd []byte) {
  // Define a lookup function to find data in the data set
	lookup := func(index uint32) []uint32 {
		offset := index * hashWords //hashWords is the constant value defined above = 16
		return dataset[offset : offset+hashWords]
	}
	return hashimoto(hash, nonce, uint64(len(dataset))*4, lookup)
}
Copy the code

You can see that what the hashimotoFull function does is actually read and slice up the original data set and pass it to the Hashimoto function. Next, focus on analyzing hashimoto functions:

3.1 Get the header based on the SEED

	rows := uint32(size/mixBytes) ① seed :=make([]byte.40) 2.copy(seed, hash) (3) binary. LittleEndian. PutUint64 (seed [32:], nonce). (4) seed = crypto Keccak512 (seed) (5) seedHead: = binary. LittleEndian. Uint32 (seed) 6Copy the code
  1. Calculates the number of rows in the dataset
  2. mergeheader+nonceTo a 40-byteseed
  3. Will block headerhashCopy toseedIn the
  4. willnonceValue to fill inseedAfter (40-32=8) bytes, (nonce itself isuint64Type, which is 64 bits, corresponds to 8 bytes size), just puthashandnonceThe seed is fully filled with 40 bytes
  5. Keccak512encryptionseed
  6. fromseedGets the header in the

3.2 Start mixing from the copied seed

  • mixBytesConstant = 128,mixThe length of is 32 and the element isuint32, is 32 bits, corresponding to the size of 4 bytes. somixThe total size is 4*32=128 bytes
mix := make([]uint32, mixBytes/4)
	for i := 0; i < len(mix); i++ {
		mix[i] = binary.LittleEndian.Uint32(seed[i%16*4:)}Copy the code

3.3 Mixed random dataset nodes

temp := make([]uint32.len(mix))// It has the same structure and length as mix
	for i := 0; i < loopAccesses; i++ {
		parent := fnv(uint32(i)^seedHead, mix[i%len(mix)]) % rows
		for j := uint32(0); j < mixBytes/hashBytes; j++ {
			copy(temp[j*hashWords:], lookup(2*parent+j))
		}
		fnvHash(mix, temp)
	}
Copy the code

3.4 Compression mixing

for i := 0; i < len(mix); i += 4 {
		mix[i/4] = fnv(fnv(fnv(mix[i], mix[i+1]), mix[i+2]), mix[i+3])
	}
	mix = mix[:len(mix)/4]

	digest := make([]byte, common.HashLength)
	for i, val := range mix {
		binary.LittleEndian.PutUint32(digest[i*4:], val)
	}
	return digest, crypto.Keccak256(append(seed, digest...) )Copy the code

What comes back is the Digest and the hash of the Digest and the seed; Digest is the []byte form of mix. In the previous ethash.mine code we have seen that the second return value is compared to the target variable to determine whether it is a valid hash.


Verify the pow

The verification of mining information has two parts:

  1. validationHeader.DifficultyWhether it is right
  2. validationHeader.MixDigestandHeader.NonceWhether it is right

(1) Verify the header. Difficulty code is mainly in ethash. verifyHeader:

func (ethash *Ethash) verifyHeader(chain consensus.ChainReader, header, parent *types.Header, uncle bool, seal bool) error{... expected := ethash.CalcDifficulty(chain, header.Time.Uint64(), parent)ifexpected.Cmp(header.Difficulty) ! =0 {
    return fmt.Errorf("invalid difficulty: have %v, want %v", header.Difficulty, expected)
  }
}
Copy the code

The block height and time difference are used as parameters to calculate the Difficulty value, which is then compared to the header. Difficulty field of the block to be verified. If it is equal, it is considered correct.

(2) : Authentication of MixDigest and Nonce is mainly in header. verifySeal:

Validation: The MixDigest and result hashes are recalculated in Hashimoto using header. Nonce and Header hashes, and the verified node does not require dataset data.


Summary & Reference

Mindcarver. Cn fostered fostered fostered

Github.com/blockchainG… Do do do

The eth. Wiki/concepts/et…

The eth. Wiki/concepts/et…

www.vijaypradeep.com/blog/2017-0…