Understand the principles of Golang hash table Map

An overview of the
- The hash function
- Conflict resolution
Initialize the
- The structure of the body
- literal
- The runtime
operation
- access
- write
- capacity
- delete
conclusion
Related articles
Reference

In the previous section, we introduced arrays and slicing. In this section, we will introduce the implementation of another set element in Golang, hash, or Map. Hash table is in addition to an array, the most common data structures, almost all languages the array, and hash table set these two elements, some language will be implemented as an array list, some language called the hash table structure or a dictionary, but they set is actually two design elements of ideas, array is used to represent an element of the sequence, A hash is a mapping between key and value pairs, but it is called and implemented slightly differently in different languages.

An overview of the

Hash table data structure is a kind of old, in 1953, someone using a zipper method to realize the hash table, it can according to the Key (Key) direct access to the storage location in memory, that is to say, we can find the Key directly through the Key corresponding to a value, the source of the hash table name because it USES hash function to map a Key to a bucket, The bucket may contain the value of the key.

The hash function

The realization of the hash table is the key to how to choose the hash function, the choice of hash function and largely decide the read and write performance of the hash table, ideally, hash function should be able to map the different keys to a unique index, but the number of key collection could be far greater than the scope of the mapping of different key after the processing of hash function should return a unique value, This requires that the output range of the hash function be larger than the input range, so in practical use, this ideal state is impossible to achieve.

The more practical way is to make the results of the hash function evenly distributed, so that although the hash will collide and conflict, but we only need to solve the problem of hash collision can be used in engineering this more practical scheme, the result of the uneven hash function will cause more flush and bring worse performance.

In a fairly uniform hash function, it takes O(1) time to add, delete, change, and check the hash, but a very uneven hash function will cause all operations to occupy the lowest O(n) complexity, so using good hash functions in the hash table is crucial.

Conflict resolution

As we mentioned before, under normal circumstances, the scope of the hash function input will be far greater than the range of output, so we will be encountered in the process of using hash table conflict, without conflicts if there is a perfect hash function, so we just need to all values are stored in a one dimensional array.

As mentioned above, this scenario is basically not exist in reality, our hash functions are not perfect and the range of output is also limited, this also must hash collision will happen, we will need to use some methods to resolve hash collision problem, common is open addressing method and zipper method two kinds.

Open addressing

Open addressing method is a method of resolve hash collision in the hash table, is the core idea of this method in a one-dimensional array of elements detection and compared to decide whether to find the target key exists in the current hash table, if we use open addressing method to implement a hash table, then when initializes the hash table will be to create a new array, If a conflict occurs when we write new data to the current “hash table”, the key-value pair is written to the next non-empty location:

If Key3 conflicts with Key1 and Key2 in the hash table, Key3 will be automatically written to the free memory behind Key2. In this case, when we read the value of Key3, we will hash the key and index it to Key1. A mismatch between the target key and Key1 is found, and subsequent elements are read until memory is empty or the target element is found.

When we look for the data corresponding to a key, we will perform linear detection on the memory hit by the hash. Finding the target key-value pair or empty memory means the end of the query operation.

Open addressing method is the greatest influence on the performance of load factor, it is the ratio of the number of elements in the array and array size, with the increase of load factor, the average time of linear detection will gradually increase, this will also influence on the performance of the hash table lookup and insert, when cubed out of more than 70%, the performance of the hash table will be a sharp drop in, Once the load rate reaches 100%, the entire hash table will be completely invalid, so when implementing the hash table, we must always pay attention to the change of the load factor.

Zipper method

Compared with the method of open address, zipper method is the most common method of hash tables, most programming languages is zipper method is used to implement a hash table, its implementation is relatively simple, the average search length is short, and each node is used to store the memory is dynamic application, can save more storage space.

Each element in the array is actually a linked list. We can view it as a two-dimensional array that can be expanded:

When we will need a key to join a hash table, the key in key/value pair will first after a hash function, hash () function returns the hash will help us to “choose” a barrel, and then traverse held in current bucket list, find the key of the same key value to perform the update operation or traversing the list at the end of the add a new key/value pair.

If you want to retrieve a value from a hash table using a key, you go through the following process:

Key11 is an example of a key that does not exist in the hash table. When the hash table finds that it has hit bucket 2, it iterates through the list of buckets in order to solve the problem. At the end of the list, it does not find the desired key.

In a good hash table, there should be 0 or 1 elements in each bucket, sometimes 2 or 3 elements, and rarely more than that. The time spent reading and writing the hash table is spent locating the bucket and traversing the list. A hash table with 1000 buckets holding 10000 key-value pairs performs 1/10 as well as a hash table with 1000 key-value pairs, but is still 1000 times better than a linked list.

Initialize the

Now that we have introduced some basic principles and implementation methods of common hash tables, we can begin to introduce the main topic of the article, which is the implementation principle of the hash table in Go. First, we will talk about the representation of the hash in Go and the two methods of initializing the hash — through literals and runtime.

The structure of the body

The Golang hash table is defined in the SRC /runtime/map.go file. The hmap structure of the hash table looks like this.

type hmap struct {
	count     int
	flags     uint8
	B         uint8
	noverflow uint16
	hash0     uint32

	buckets    unsafe.Pointer
	oldbuckets unsafe.Pointer
	nevacuate  uintptr

	extra *mapextra
}
Copy the code

countThis field is used to record the current number of elements in the hash table. This field eliminates the need to traverse the entire hash table to obtain the length.
BRepresents the value currently held by the hash tablebucketsNumber, but since the expansion of the hash table is done by a factor of 2, the logarithm will be used for storage, which we can simply understand aslen(buckets) == 2^B;
hash0Is the seed of the hash function. This value is passed in as an argument when the hash function is called. Its main function is to introduce some randomness into the result of the hash function.
oldbucketsIs used before the hash is saved during expansionbucketsField, its size are all currentbucketsThe half;

The hmap structure actually exists at compile time and at run time, and an identical structure is constructed during compilation using the Hmap function:

func hmap(t *types.Type) *types.Type {
	bmap := bmap(t)

	fields := []*types.Field{
		makefield("count", types.Types[TINT]),
		makefield("flags", types.Types[TUINT8]),
		makefield("B", types.Types[TUINT8]),
		makefield("noverflow", types.Types[TUINT16]),
		makefield("hash0", types.Types[TUINT32]),
		makefield("buckets", types.NewPtr(bmap)),
		makefield("oldbuckets", types.NewPtr(bmap)),
		makefield("nevacuate", types.Types[TUINTPTR]),
		makefield("extra", types.Types[TUNSAFEPTR]),
	}

	hmap := types.New(TSTRUCT)
	hmap.SetNoalg(true)
	hmap.SetFields(fields)
	dowidth(hmap)

	t.MapType().Hmap = hmap
	hmap.StructType().Map = t
	return hmap
}
Copy the code

Because many runtime code during compilation can’t perform, so we need to simulate a hmap structure for some hash table during the compilation of initialization to provide “container”, so said although the hash table is a kind of special data structure, but the underlying implementation or need to use the structure variables to store some used to manage and record.

literal

If we initialize a hash table with a literal in Go, much like in other languages, we generally use the following format:

hash := map[string]int{
	"1": 2,
	"3": 4,
	"5": 6,
}
Copy the code

We need to mark the type of the key when we use the hash. This method of initializing the variable with a literal will eventually initialize the variable with the maplit function. Let’s examine how the above code is initialized with the maplit function at compile time:

func maplit(n *Node, m *Node, init *Nodes) {
	a := nod(OMAKE, nil, nil)
	a.Esc = n.Esc
	a.List.Set2(typenod(n.Type), nodintconst(int64(n.List.Len())))
	litas(m, a, init)

	var stat, dyn []*Node
	for _, r := range n.List.Slice() {
		stat = append(stat, r)
	}

	if len(stat) > 25 {
		// ...
	} else {
		addMapEntries(m, stat, init)
	}
}
Copy the code

When there are 25 or fewer elements in the hash table, the compiler directly calls addMapEntries to convert literal-initialized constructs into the following code to add all key-value pairs individually to the hash table:

hash := make(map[string]int, 3)
hash["1"] = 2
hash["3"] = 4
hash["5"] = 6
Copy the code

If the number of elements in the hash table exceeds 25, two arrays are created at compile time to store the key and value information, and these key-value pairs are added to the target hash through a for loop as shown below:

hash := make(map[string]int, 26) vstatk := []string{"1", "2", "3", ... , "26"} vstatv := []int{1, 2, 3... , 26} for i := 0; i < len(vstak); i++ { hash[vstatk[i]] = vstatv[i] }Copy the code

But either way, the process of initializing with literals initializes a new hash using the Go keywords and sets the key values in the hash using the [] syntax. Of course, the generated literals used to initialize arrays are also expanded by the compiler. See the previous section on Arrays and slicing for details on how to expand and implement them.

The runtime

Whether we use make directly in Go or generated by the compiler, it is converted to the makemap function during type checking to create the hash table:

func makemap(t *maptype, hint int, h *hmap) *hmap { mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size) if overflow || mem > maxAlloc { hint = 0 } if h == nil { h = new(hmap) } h.hash0 = fastrand() B := uint8(0) for overLoadFactor(hint, B) { B++ } h.B = B if h.B ! = 0 { var nextOverflow *bmap h.buckets, nextOverflow = makeBucketArray(t, h.B, nil) if nextOverflow ! = nil { h.extra = new(mapextra) h.extra.nextOverflow = nextOverflow } } return h }Copy the code

This function creates a random hash seed with fastrand, calculates the minimum number of buckets needed from the hint passed in, and then creates an array of buckets using makeBucketArray. In this method, a contiguous space is allocated to the memory for data storage based on the number of buckets to be created. During the bucket creation process, additional buckets are created to store overflow data. The number of buckets is 2^(B-4).

The type of the hash table is stored in each bucket. The structure of the bucket, bmap, is defined in the Go language source code with a simple TopHash field:

type bmap struct {
	tophash [bucketCnt]uint8
}
Copy the code

The actual structure of the buckets in the hash table is actually created “dynamically” in the bmap function that runs at compile time:

func bmap(t *types.Type) *types.Type { bucket := types.New(TSTRUCT) keytype := t.Key() valtype := t.Elem() dowidth(keytype) dowidth(valtype) field := make([]*types.Field, 0, 5) arr := types.NewArray(types.Types[TUINT8], BUCKETSIZE) field = append(field, makefield("topbits", arr)) arr = types.NewArray(keytype, BUCKETSIZE) keys := makefield("keys", arr) field = append(field, keys) arr = types.NewArray(valtype, BUCKETSIZE) values := makefield("values", arr) field = append(field, values) if int(valtype.Align) > Widthptr || int(keytype.Align) > Widthptr { field = append(field, makefield("pad", types.Types[TUINTPTR])) } otyp := types.NewPtr(bucket) if ! types.Haspointers(valtype) && ! types.Haspointers(keytype) { otyp = types.Types[TUINTPTR] } overflow := makefield("overflow", otyp) field = append(field, overflow) // ... t.MapType().Bucket = bucket bucket.StructType().Map = t return bucket }Copy the code

This approach is because Go language will directly operate the memory space when performing the operation of hash. At the same time, the space size occupied by different key value types is also different, which leads to the uncertainty of the type and memory occupied, so there is no way to declare in the Go language source code.

We can reconstruct the bmap structure from the implementation of the above function:

type bmap struct {
    topbits  [8]uint8
    keys     [8]keytype
    values   [8]valuetype
    pad      uintptr
    overflow uintptr
}
Copy the code

In a hash table in each barrel storage can be at most eight elements, more than 8 if storage element in the barrel, then the hash table execution efficiency will fell sharply, but in the practical use if a hash table to store data, we will use of hash table expansion or extra barrels overflow of data storage, Do not allow more than 8 data in a bucket:

Overflow buckets are only a temporary solution, and creating too many overflow buckets will eventually lead to hash expansion.

The runtime of the Go language allocates memory for the hash table to store the key-value pairs in the hash table. Both the structure of the hash and the structure of the bucket are initialized at run time, but the latter and the fact structure exists in the source code, which is a “virtual” structure generated using code.

operation

As a data structure, we definitely need to study the operation of hash table, that is, the realization principle of different read and write operations. When we talk about the reading of hash table, we generally read data stored in the data structure by subscript and traversal:

_ = hash[key]

for k, v := range hash {
    // k, v
}
Copy the code

Although these two approaches are to read the data in the hash table, but using the function and the underlying principle of completely different, the former need to know the hash keys and only one key corresponding value, while the latter can traverse a hash of all key/value pair, when access to the data, also do not need to know in advance the corresponding key, we will be in the back of the chapter introduces the implementation principle of the range.

Writing to a data structure generally refers to adding, deleting, and modifying fields. Adding and modifying fields are done through indexes and assignments. Deleting data from a dictionary requires the keyword delete:

hash[key] = value
hash[key] = newValue
delete(hash, key)
Copy the code

We will discuss of these different operation one by one and give them the underlying principle of specific implementation, most of these operations is through compile time and runtime, introduce the process of may omit some during the compilation of the relevant knowledge, specific can read the previous chapter Summary of the build process understanding process and principles of compilation.

access

During the type checking phase of compilation, all hash[key] -like OINDEX operations are converted to OINDEXMAP operations. These OINDEXMAP operations are then converted to the following code in walkexpr before intermediate code generation:

v     := hash[key] // => v     := *mapaccess1(maptype, hash, &key)
v, ok := hash[key] // => v, ok := mapaccess2(maptype, hash, &key)
Copy the code

The number of arguments accepted to the left of the assignment statement also affects the run-time arguments of the final call. The mapAccess1 function is used when only one argument is accepted, and the mapAccess2 function is used when both the value of the key and a Boolean value indicating whether the key exists. The mapAccess1 function returns only a pointer to the target value:

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer { alg := t.key.alg hash := alg.hash(key, uintptr(h.hash0)) m := bucketMask(h.B) b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize))) top := tophash(hash) bucketloop: for ; b ! = nil; b = b.overflow(t) { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] ! = top { if b.tophash[i] == emptyRest { break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } if alg.equal(key, k) { v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize)) if t.indirectvalue() { v = *((*unsafe.Pointer)(v)) } return v } } } return unsafe.Pointer(&zeroVal[0]) }Copy the code

In this function, we first get the hash corresponding to the current key through the hash function set in the hash table and the seed. Then we get the bucket where the key value pair is located and the top 8 digits of the hash through bucketMask and add functions. These 8 digits are finally compared with the tophash stored in the bucket. Each bucket stores 8 tophashes, which are the topbits fields at compile time, each time compared to all 8 uint8 of the bucket. The 8-bit tophash is like a level 1 cache that stores the highest 8 bits of hash. When selecting buckets, we use the lowest bits of the bucket mask, which can help us quickly determine whether the current key-value pair exists and reduce collisions:

Each bucket is a whole piece of memory space. When we find that a topbits matches the tophash of the passed key, we obtain the key stored in the hash by pointer and offset and compare them. If they are identical, we obtain the pointer of the target value by the same method and return it.

The other mapAccess2 function, also used to access the data in the hash table, actually returns a Boolean value indicating whether the current data exists on the basis of mapAccess1:

func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) { // ... bucketloop: for ; b ! = nil; b = b.overflow(t) { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] ! = top { if b.tophash[i] == emptyRest { break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) // ... if alg.equal(key, k) { v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize)) //... return v, true } } } return unsafe.Pointer(&zeroVal[0]), false }Copy the code

When we use v, OK := hash[k] to access elements in the hash table, we can use this Boolean value to know more precisely whether the null pointer is the element in the hash when v == nil or whether the key does not exist, so when we access the hash, The authors recommend using this method to determine whether an element exists first.

The above process is under normal circumstances, when accessing a hash table elements, but just like an array, the hash table may be in the loading factor is too high or too much overflow buckets expansion, expansion when how to ensure the performance of the access is an interesting topic, we actually also omitted here related code, This will be covered in the expansion section below.

write

When a hash[k] expression appears to the left of the assignment symbol, it will be converted to a call to mapAssign during compilation type checking and intermediate code generation. This function is divided into several parts:

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
	alg := t.key.alg
	hash := alg.hash(key, uintptr(h.hash0))

	h.flags ^= hashWriting

again:
	bucket := hash & bucketMask(h.B)
	b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
	top := tophash(hash)
Copy the code

If the same result is found, the address of the destination location is retrieved and returned. Both the lookup key and the value are directly addressed by the arithmetic calculation:

var inserti *uint8 var insertk unsafe.Pointer var val unsafe.Pointer bucketloop: for { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] ! = top { if isEmpty(b.tophash[i]) && inserti == nil { inserti = &b.tophash[i] insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize)) } if b.tophash[i] == emptyRest { break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } if ! alg.equal(key, k) { continue } val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize)) goto done } ovf := b.overflow(t) if ovf == nil { break }  b = ovf }Copy the code

When the current hash table is not in the expanded state and the load factor exceeds 6.5 or there are too many overflowing buckets, hashGrow is called to expand the current hash table.

The load factor is determined by both loadFactorNum and loadFactDen. The former is defined as 13 in the Go source code and the latter is defined as 2, so the load factor is 6.5.

If the current bucket is already full, newoverflow is called to create a new bucket or hmap is used to pre-create a bucket in noverflow to hold data. The pointer to the newly created bucket is appended to the existing bucket, and at the same time, The creation of an overflow bucket increases the noverflow counter for the hash table.

if ! h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) { hashGrow(t, h) goto again } if inserti == nil { newb := h.newoverflow(t, b) inserti = &newb.tophash[0] insertk = add(unsafe.Pointer(newb), dataOffset) val = add(insertk, bucketCnt*uintptr(t.keysize)) } if t.indirectkey() { kmem := newobject(t.key) *(*unsafe.Pointer)(insertk) = kmem insertk  = kmem } if t.indirectvalue() { vmem := newobject(t.elem) *(*unsafe.Pointer)(val) = vmem } typedmemmove(t.key, insertk, key) *inserti = top h.count++ done: return val }Copy the code

If the key value stored in the hash table is pointer type, it will apply for a new memory space for the current key value pair respectively, and move the key to the applied memory space through eypedMemMove at the insertion position, and finally return the address of the corresponding value of the key.

capacity

The sameSizeGrow () function is used to expand the size of the hashGrow table. This function is used to expand the size of the hashGrow table. The sameSizeGrow function is used to expand the hashGrow table.

func hashGrow(t *maptype, h *hmap) { bigger := uint8(1) if ! overLoadFactor(h.count+1, h.B) { bigger = 0 h.flags |= sameSizeGrow } oldbuckets := h.buckets newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil) flags := h.flags &^ (iterator | oldIterator) if h.flags&iterator ! = 0 { flags |= oldIterator } h.B += bigger h.flags = flags h.oldbuckets = oldbuckets h.buckets = newbuckets h.nevacuate = 0 h.noverflow = 0 if h.extra ! = nil && h.extra.overflow ! = nil { h.extra.oldoverflow = h.extra.overflow h.extra.overflow = nil } if nextOverflow ! = nil { if h.extra == nil { h.extra = new(mapextra) } h.extra.nextOverflow = nextOverflow } }Copy the code

To expand the hash table, we create a new bucket array and some pre-created overflow buckets with makeBucketArray. We then update the oldbucket array to oldbuckets and empty buckets to buckets using the same logic.

We can’t see the difference caused by sameSizeGrow in the above function, because the actual “data migration” of the hash table is actually performed in the evacuate function, where new buckets are created and no data is copied or moved. The evacuate function “redistributes” the elements in the incoming bucket.

func evacuate(t *maptype, h *hmap, oldbucket uintptr) { b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))) newbit := h.noldbuckets() if ! evacuated(b) { var xy [2]evacDst x := &xy[0] x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize))) x.k = add(unsafe.Pointer(x.b), dataOffset) x.v = add(x.k, bucketCnt*uintptr(t.keysize)) if ! h.sameSizeGrow() { y := &xy[1] y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize))) y.k = add(unsafe.Pointer(y.b), dataOffset) y.v = add(y.k, bucketCnt*uintptr(t.keysize)) }Copy the code

The evacuate function initially creates an array of the evacDst structure used to hold the distribution purpose, which holds the pointer to the target bucket, the number of elements that the target bucket stores, and the location of the current key and value stores.

If this is an unresized expansion, only one of the two evacDst structures is initialized, and when the hash table doubles in capacity, the elements in one bucket are split into two newly created buckets, both of which are referenced by the evacDst array at the same time, and here is the element splitting logic:

for ; b ! = nil; b = b.overflow(t) { k := add(unsafe.Pointer(b), dataOffset) v := add(k, bucketCnt*uintptr(t.keysize)) for i := 0; i < bucketCnt; i, k, v = i+1, add(k, uintptr(t.keysize)), add(v, uintptr(t.valuesize)) { top := b.tophash[i] k2 := k var useY uint8 if ! h.sameSizeGrow() { hash := t.key.alg.hash(k2, uintptr(h.hash0)) if hash&newbit ! = 0 { useY = 1 } } b.tophash[i] = evacuatedX + useY dst := &xy[useY] if dst.i == bucketCnt { dst.b = h.newoverflow(t, dst.b) dst.i = 0 dst.k = add(unsafe.Pointer(dst.b), dataOffset) dst.v = add(dst.k, bucketCnt*uintptr(t.keysize)) } dst.b.tophash[dst.i&(bucketCnt-1)] = top typedmemmove(t.key, dst.k, k) typedmemmove(t.elem, dst.v, v) dst.i++ dst.k = add(dst.k, uintptr(t.keysize)) dst.v = add(dst.v, uintptr(t.valuesize)) } } } if oldbucket == h.nevacuate { advanceEvacuationMark(h, t, newbit) } }Copy the code

If there are eight buckets in the new hash table, in most cases, the data with a bucket mask of one will be split into the new bucket number one and bucket number five due to the increase of the bucket mask by one bit, and all data will be typedMemMove copied to the target bucket’s key and value space:

It ends with a call to advanceEvacuationMark, which adds a hash nevacuate counter and then deletes unnecessary data after all old buckets have been shunted. However, because data migration in Go is not a single operation, It only triggers the incremental completion of the evacuate function when data is written or deleted, so it does not have an immediate impact on performance.

The code that obtains buckets from oldbuckets is omitted in the previous hash table access interface. This code is the logic of obtaining buckets during expansion. If oldbuckets in the hash table exist, they locate the previous bucket according to the address and use the bucket when the current bucket is not evacuated.

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer { // ... alg := t.key.alg hash := alg.hash(key, uintptr(h.hash0)) m := bucketMask(h.B) b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize))) if c := h.oldbuckets; c ! = nil { if ! h.sameSizeGrow() { m >>= 1 } oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize))) if ! evacuated(oldb) { b = oldb } } bucketloop: // ... }Copy the code

The function of the above code is to temporarily replace the new bucket to receive the read request when the previous bucket is not moved. Because the historical bucket is not processed yet and still saves the data we need to use, the new empty bucket is directly replaced here.

GrowWork is triggered to make incremental copies of the contents of the hash table each time the value of the hash table is set:

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
	alg := t.key.alg
	hash := alg.hash(key, uintptr(h.hash0))

	h.flags ^= hashWriting

again:
	bucket := hash & bucketMask(h.B)
	if h.growing() {
		growWork(t, h, bucket)
	}
	
	// ...
}
Copy the code

Of course, in addition to the write operation, all delete operations also trigger growWork during the expansion of the hash table. This triggers growWork in almost exactly the same way and code as here: calculate the bucket of the current value and copy the elements in that bucket.

delete

To delete an element from a hash, use the Go keyword delete. The key’s only function is to delete the element corresponding to a key from the hash table. This built-in function does not return any result whether the value corresponding to the key exists or not.

At compile time, the delete keyword is converted to a node whose operation is ODELETE, and this function ends up calling one of the mapDelete functions, Mapdelete, mapdelete_faststr, mapdelete_fast32, and mapdelete_fast64:

func walkexpr(n *Node, init *Nodes) *Node {
	switch n.Op {
	case ODELETE:
		init.AppendNodes(&n.Ninit)
		map_ := n.List.First()
		key := n.List.Second()
		map_ = walkexpr(map_, init)
		key = walkexpr(key, init)

		t := map_.Type
		fast := mapfast(t)
		if fast == mapslow {
			key = nod(OADDR, key, nil)
		}
		n = mkcall1(mapfndel(mapdelete[fast], t), nil, init, typename(t), map_, key)
	}
}
Copy the code

In fact, the implementation of these functions is similar, we still choose MapDelete as the main method of analysis, if the hash table expansion is encountered during the deletion, the bucket will be shunted. Then locate the target element in the bucket and call memclrHasPointers or memclrNoHeapPointers depending on the data type to delete the key/value pair.

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) { alg := t.key.alg hash := alg.hash(key, uintptr(h.hash0)) bucket := hash & bucketMask(h.B) if h.growing() { growWork(t, h, bucket) } b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize))) bOrig := b top := tophash(hash) search: for ; b ! = nil; b = b.overflow(t) { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] ! = top { if b.tophash[i] == emptyRest { break search } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) k2 := k if ! alg.equal(key, k2) { continue } if t.indirectkey() { *(*unsafe.Pointer)(k) = nil } else if t.key.kind&kindNoPointers == 0 { memclrHasPointers(k, t.key.size) } v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize)) if t.indirectvalue() { *(*unsafe.Pointer)(v) = nil } else if t.elem.kind&kindNoPointers == 0 { memclrHasPointers(v, t.elem.size) } else { memclrNoHeapPointers(v, t.elem.size) } b.tophash[i] = emptyOne } // ... }}}Copy the code

Hash table delete logic is very similar to write logic, but it is called in a special way, we need to use keywords to “call” mapDELETE function.

For hash table deletion logic, we only need to know that the delete keyword is first converted to ODELETE operation during type checking and then to mapDELETE function cluster during SSA intermediate code generation.

conclusion

The Go language uses the zipper method to solve the problem of hash collisions. The hash table is implemented, and the operations such as access, write, and delete are converted to the corresponding runtime function or method at compile time.

Hash stored in each barrel key corresponding to the first 8 bits of the hash, when operating the hash, these tophash became a level 1 cache help hash element in the rapid traverse barrels, each barrel can only be stored eight key/value pair, once the hash of a bucket out of eight, a new key/value pair will be stored in an overflow of hash bucket.

As the number of key-value pairs increases, the number of overflow buckets and the load factor of the hash will also increase gradually. When the number of overflow buckets exceeds a certain range, the expansion operation will be triggered. The number of buckets will be allocated during expansion.

Compilation principle
- Overview of Compilation Principles
- Lexical and parser
- Type checking
- Intermediate code generation
- Machine code generation
The data structure
- Arrays and slices
- Hash table
- string

Reference

Hash table
Open addressing
Separate Chaining: Concept, Advantages & Disadvantages

About pictures and reprints

Creative Commons Attribution 4.0 International License agreement

Wechat official account

About comments and comments

Understand the principles of Golang hash table Map

Understanding the principles of Golang hash Map · Faith-oriented programming

Follow: Draveness dead simple

Understand the principles of Golang hash table Map

An overview of the

The hash function

Conflict resolution

Open addressing

Zipper method

Initialize the

The structure of the body

literal

The runtime

operation

access

write

capacity

delete

conclusion

Related articles

Reference

About pictures and reprints

Wechat official account

About comments and comments

Related Posts

SSM framework integration (Spring+SpringMVC+MyBatis)

Use the JOL tool to view the Java object layout

Construct the SSM project based on Maven