This article mainly talks about the specific execution process of map assignment, deletion, query and expansion, still from the perspective of the bottom layer. Combined with the source code, read this article will be a thorough understanding of map underlying principles.
I should note that the basic use of map is covered very little here, which I believe can be learned by reading other introductory books. This article is a bit more in-depth, but I’m sure it’s easy to follow because OF the various diagrams I’ve drawn.
What is the map
Wikipedia defines a map as follows:
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.
A quick note: In computer science, what’s called a correlation array, map, symbol table, or dictionary is an abstract data structure made up of a set of
pairs, and the same key appears only once.
There are two key points: A map consists of key-value pairs; Key only appears once.
The operations related to map are:
- Add a k-v pair — Add or insert;
- Delete a k-v pair — Remove or delete;
- Modify v — Reassign for a k;
- Query v — Lookup;
Simply say is the most basic add delete check change.
Map design is also known as “The Dictionary Problem”. Its task is to design a data structure to maintain The data of a collection, and to add, delete, check and modify The collection at The same time. There are two main types of data structures: Hash tables and Search trees.
Hash lookup tables use a hash function to assign keys to different buckets (buckets, that is, different indexes of the array). Thus, the overhead is mainly in the calculation of the hash function and the constant access time of the array. Hash lookup tables have high performance in many scenarios.
Hash lookup tables typically have the problem of “collisions,” meaning that different keys are hashed into the same bucket. Generally, there are two coping methods: linked list method and open address method. The linked list method implements a bucket as a linked list, into which all keys that fall within the same bucket are inserted. The open address rule is to select “empty space” at the back of the array after the collision, which is used to place the new key.
Search tree method generally adopts self-balanced search tree, including AVL tree and red black tree. It’s not unusual for an interviewer to be asked, or even asked, to write a red-black tree code by hand.
The worst search efficiency of self-balanced search tree method is O(logN), and the worst search efficiency of hash lookup table is O(N). Of course, the average lookup efficiency of a hash lookup table is O(1), and if the hash function is well designed, the worst case rarely happens. The other thing is, if you walk through a self-balanced search tree, the key sequence that you return is usually in descending order; Hash lookup tables are out of order.
Why map
Here’s an excerpt from the official Go blog:
One of the most useful data structures in computer science is the hash table. Many hash table implementations exist with varying properties, but in general they offer fast lookups, adds, and deletes. Go provides a built-in map type that implements a hash table.
Hash tables are one of the most important designs in computer data structures. Most hash tables provide quick search, add, and delete functions. Map, built into the Go language, does all this.
It’s hard to imagine writing a program that doesn’t use a map, making it hard to answer the question of why.
So why use map at all? Because it is too powerful, all kinds of add delete check change operation efficiency is very high.
How is map implemented at the bottom
First declare the Go version I use:
Go version go1.9.2 Darwin/amd64Copy the code
The Go language uses hash lookup tables and linked lists to resolve hash conflicts.
Let’s explore the core principles of Map and get a look inside.
Map memory model
In the source code, the structure that represents a map is hmap, which is “short” for HashMap:
// A header for a Go map.
type hmap struct {
// Number of elements. Len (map) returns this value
count int
flags uint8
// Buckets' log log_2
B uint8
// Overflow bucket approximate number
noverflow uint16
// The hash function is passed in when the key is hashed
hash0 uint32
// Points to the buckets array, which is 2^B
// If the number of elements is 0, nil
buckets unsafe.Pointer
// Buckets will be twice as long as oldbuckets
oldbuckets unsafe.Pointer
// Indicates the capacity expansion progress. Buckets whose address is smaller than this one have been migrated
nevacuate uintptr
extra *mapextra // optional fields
}
Copy the code
B is the log of the lengths of the buckets array, so the lengths of the buckets array are 2^B. Buckets store keys and values, more on that later.
Buckets is a pointer to a structure:
type bmap struct {
tophash [bucketCnt]uint8
}
Copy the code
But this is just the surface (SRC /runtime/hashmap.go) structure, which is added at compile time to dynamically create a new structure:
type bmap struct {
topbits [8]uint8
keys [8]keytype
values [8]valuetype
pad uintptr
overflow uintptr
}
Copy the code
A Bmap is a bucket. A bucket can contain up to eight keys, all of which fall into the same bucket because they are hashed into a “class” hash. In the bucket, the hash value calculated by the key determines which position the key falls into (a bucket has a maximum of eight positions).
Here’s an overall picture:
If the map’s key and value are not Pointers and the size is less than 128 bytes, the BMAP is marked as pointerless to avoid scanning the entire Hmap during GC. If bmap does not contain any Pointers, it will move overflow to extra.
type mapextra struct {
// overflow[0] contains overflow buckets for hmap.buckets.
// overflow[1] contains overflow buckets for hmap.oldbuckets.
overflow [2]*[]*bmap
// nextOverflow contains empty Overflow buckets, which are pre-allocated buckets
nextOverflow *bmap
}
Copy the code
Bmap is where k-V is stored, so let’s zoom in and take a closer look at the inside of BMAP.
The figure above shows the bucket memory model, and the HOB Hash refers to the top Hash. Key /value/key/value/… It looks like this. The advantage of this is that you can omit the padding field in some cases to save memory space.
For example, there is a map of this type:
map[int64]int8
Copy the code
If the key/value/key/value/… The pattern store requires an additional padding of 7 bytes after each key/value pair. Instead, bind all keys and values together separately. This form is key/key/… /value/value/… , you only need to add the padding at the end.
Each bucket is designed to hold a maximum of eight key-value pairs. If a ninth key-value falls into the current bucket, a new bucket needs to be constructed and connected via overflow Pointers.
Create a map
Syntactically, creating a map is simple:
ageMp := make(map[string]int)
// Specify the map length
ageMp := make(map[string]int.8)
// ageMp is nil, can't add elements to it, will panic directly
var ageMp map[string]int
Copy the code
As can be seen from assembly language, in fact, the underlying call is makemap function, the main work is to initialize the various fields of the HMAP structure, such as calculating the size of B, setting the hash seed hash0, etc.
func makemap(t *maptype, hint int64, h *hmap, bucket unsafe.Pointer) *hmap {
// omit various condition checks...
// Find a B that makes the map load factor within the normal range
B := uint8(0)
for ; overLoadFactor(hint, B); B++ {
}
// Initialize the hash table
// If B is equal to 0, buckets will allocate at assignment
// If the length is large, it takes longer to allocate memory
buckets := bucket
var extra *mapextra
ifB ! =0 {
var nextOverflow *bmap
buckets, nextOverflow = makeBucketArray(t, B)
ifnextOverflow ! =nil {
extra = new(mapextra)
extra.nextOverflow = nextOverflow
}
}
// Initialize hamP
if h == nil {
h = (*hmap)(newobject(t.hmap))
}
h.count = 0
h.B = B
h.extra = extra
h.flags = 0
h.hash0 = fastrand()
h.buckets = buckets
h.oldbuckets = nil
h.nevacuate = 0
h.noverflow = 0
return h
}
Copy the code
Note that the makeslice function returns a Slice structure while hmap is a pointer: makeslice (makeslice); makeslice (hmap); makeslice (hmap); makeslice (hmap); makeslice (hmap); makeslice (hmap); makeslice (hmap);
func makeslice(et *_type, len.cap int) slice
Copy the code
To review the slice structure definition:
// runtime/slice.go
type slice struct {
array unsafe.Pointer // Element pointer
len int / / the length
cap int / / capacity
}
Copy the code
The structure contains Pointers to the underlying data.
Makeslice: When map and slice are used as function parameters, operations on the map within the function parameters affect the map itself. Not for Slice (as discussed in the previous slice article).
Main reasons: one is pointer (*hmap) and one is structure (slice). Function parameters in Go are passed by value. Inside a function, parameters are copied locally. * After the hmap pointer is copied, it still points to the same map, so operations on the map inside the function affect the arguments. Slice, after being copied, becomes a new slice, and operations on it do not affect the arguments.
The hash function
One of the key points of map is the choice of hash functions. When the program starts, it checks whether the CPU supports AES, if so, use AES Hash, otherwise use memHash. This is done in the function alginit() under the path SRC /runtime/alg.go.
Hash function, both encrypted and unencrypted. Encryption is used to encrypt data and digital abstracts. Typical examples are MD5, SHA1, SHA256, and AES256. The unencrypted type is usually lookup. In the map scenario, lookup is used. The choice of a hash function is based on two main considerations: performance and collision probability.
We talked earlier about structures that represent types:
type _type struct {
size uintptr
ptrdata uintptr // size of memory prefix holding all pointers
hash uint32
tflag tflag
align uint8
fieldalign uint8
kind uint8
alg *typeAlg
gcdata *byte
str nameOff
ptrToThis typeOff
}
Copy the code
The alG field is hash related and is a pointer to the following structure:
// src/runtime/alg.go
type typeAlg struct {
// (ptr to object, seed) -> hash
hash func(unsafe.Pointer, uintptr) uintptr
// (ptr to object A, ptr to object B)- > = =?equal func(unsafe.Pointer, unsafe.Pointer) bool
}
Copy the code
TypeAlg contains two functions, the hash function that evaluates the hash value of a type, and the equal function that evaluates whether two types are “hashed equal.”
For string, the hash and equal functions are as follows:
func strhash(a unsafe.Pointer, h uintptr) uintptr {
x := (*stringStruct)(a)
return memhash(x.str, h, uintptr(x.len))}func strequal(p, q unsafe.Pointer) bool {
return* (*string)(p) == *(*string)(q)
}
Copy the code
The alG field of the _type structure is set to the hash and equal functions of the corresponding type, depending on the key type.
Key Location process
The key is hashed to get a hash value of 64 bits, and only the last B bits are used to calculate which bucket it will fall into. Remember the B mentioned earlier? If B is equal to 5, the number of buckets is 2^5 = 32.
For example, now we have a key hash function that evaluates to the hash result:
10010111 | 000011110110110010001111001010100010010110010101010 │ 01010
Copy the code
Using the last five bits, 01010, the value is 10, which is bucket number 10. This operation is actually the mod operation, but the mod operation is too expensive, so the code implementation uses bit operation instead.
Then use the high 8 bits of the hash value to find the location of the key in the bucket, which is looking for an existing key. There is no key in the bucket at first, so the new key will find the first empty space and put it in.
A bucket number is a bucket number, and a hash conflict occurs when two different keys fall into the same bucket. The conflict is resolved using a linked list: find the first empty space in the bucket from front to back. In this way, when searching for a key, the bucket is first found and then the key in the bucket is traversed.
Here reference Cao Da github blog in a picture, the original picture is ASCII map, geek flavor is full, you can find cao Da’s blog from resources, recommend everyone to have a look.
In the figure above, we assume B = 5, so the total number of buckets is 2^5 = 32. First compute the hash of the key to be searched, use the lower 5 bits 00110, find the corresponding bucket 6, use the higher 8 bits 10010111, corresponding to decimal 151, Bucket 6 looks for the key with the topB hash value of 151 and finds slot 2, which completes the search process.
If not found in a bucket, and overflow is not empty, continue searching overflow buckets until all or all key slots are found, including all Overflow buckets.
Let’s look at the source code, ha ha! As you can see from assembly language, the underlying function for finding a key is the Mapacess family of functions, which have similar functions but will be discussed in the next section. Here we look directly at the mapacess1 function:
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
/ /...
// If h has nothing, return zero
if h == nil || h.count == 0 {
return unsafe.Pointer(&zeroVal[0])}// Write and read conflicts
ifh.flags&hashWriting ! =0 {
throw("concurrent map read and map write")}// The hash algorithm used for different types of keys is determined at compile time
alg := t.key.alg
// Calculate the hash value and add hash0 to introduce randomness
hash := alg.hash(key, uintptr(h.hash0))
// If B=5, then m is 31 and binary is all ones
// Add bucket num to m.
// Achieve bucket num determined by the lower 8 bits of the hash
m := uintptr(1)<<h.B - 1
// b is the address of the bucket
b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
// OldBuckets is not nil
ifc := h.oldbuckets; c ! =nil {
// If it is not the same as the size of the expansion (see the content of the expansion later)
// The solution to condition 1
if! h.sameSizeGrow() {// The number of new buckets is twice as large as the old one
m >>= 1
}
// Find the bucket position of the key in the old map
oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
// If the OLDB is not relocated to the new bucket
// Look in the old bucket
if! evacuated(oldb) { b = oldb } }// Computes the high 8-bit hash
// This is equivalent to moving 56 bits to the right, taking only 8 bits higher
top := uint8(hash >> (sys.PtrSize*8 - 8))
// Add a minTopHash
if top < minTopHash {
top += minTopHash
}
for {
// Iterate through the eight buckets
for i := uintptr(0); i < bucketCnt; i++ {
// TopHash does not match, continue
ifb.tophash[i] ! = top {continue
}
// Tophash matches to locate the key
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
// key is a pointer
if t.indirectkey {
/ / reference solution
k = *((*unsafe.Pointer)(k))
}
// If key is equal
if alg.equal(key, k) {
// Locate the position of value
v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
// value dereference
if t.indirectvalue {
v = *((*unsafe.Pointer)(v))
}
return v
}
}
Overflow Bucket = overflow Bucket = overflow bucket = overflow bucket
b = b.overflow(t)
Overflow Bucket overflow Bucket overflow Bucket overflow Bucket
// Returns zero
if b == nil {
return unsafe.Pointer(&zeroVal[0])}}}Copy the code
The function returns a pointer to h[key]. If h does not have the key, it returns a zero of the corresponding type, not nil.
The code as a whole is fairly straightforward, there is nothing difficult about it. Follow the notes above to understand step by step.
Here, let’s talk about how to locate keys and values and how to write the whole loop.
// The key location formula
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
// value specifies the location formula
v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
Copy the code
B is the address of bmap, which is still the structure defined in the source code, and contains only a TopHash array. The structure expanded by the compiler contains key, value, overflow and other fields. DataOffset is the key offset relative to the bmap start address:
dataOffset = unsafe.Offsetof(struct {
b bmap
v int64
}{}.v)
Copy the code
Unsafe.Pointer(b)+dataOffset. The address of the ith key is going to span the size of the I key; And we know that the address of value comes after all the keys, so the address of the ith value needs to be offset by all the keys. With this in mind, the above formula for locating keys and values is easy to understand.
And the way to write the whole loop, the outermost loop is an infinite loop, through
b = b.overflow(t)
Copy the code
Walking through all the buckets is like a bucket list.
When a specific bucket is located, the inner loop iterates through all the cells in the bucket, or all the slots, i.e. BucketCnt =8 slots. The whole cycle:
For minTopHash, when the topHash value of a cell is smaller than the minTopHash value, it indicates the migration status of the cell. Since the state value is in the topHash array, to distinguish it from the normal hash value, the hash value evaluated by key is incremented: minTopHash. This differentiates the normal top hash from the state hash.
The following states represent buckets:
// Empty cell, which is also the initial state of the bucket
empty = 0
// An empty cell indicates that the cell has been migrated to the new bucket
evacuatedEmpty = 1
// The key,value bucket has been moved, but the keys are in the first half of the new bucket.
// More on that later in the expansion section.
evacuatedX = 2
// add key to key
evacuatedY = 3
// The minimum normal value of tophash
minTopHash = 4
Copy the code
Check whether the bucket has been moved.
func evacuated(b *bmap) bool {
h := b.tophash[0]
return h > empty && h < minTopHash
}
Copy the code
We just take the first value of the topHash array and see if it is between 0 and 4. By contrast, when top hash is one of three values, namely, evacuatedX and atedy, it means that keys in the bucket are relocated to a new bucket.
Two GET operations on a map
Map can be read with or without comma in the Go language. When the key to be queried is not in the map, comma returns a bool indicating whether the key is in the map. Statements without comma return a zero value of type value. It returns 0 if value is an int, and an empty string if value is a string.
package main
import "fmt"
func main(a) {
ageMap := make(map[string]int)
ageMap["qcrao"] = 18
// can be used as an example
age1 := ageMap["stefno"]
fmt.Println(age1)
// can be used as an example
age2, ok := ageMap["stefno"]
fmt.Println(age2, ok)
}
Copy the code
Running results:
0
0 false
Copy the code
I used to think it was amazing. How did it happen? This is what the compiler does behind the scenes: it analyzes the code and maps the two syntax to two different underlying functions.
// src/runtime/hashmap.go
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool)
Copy the code
In the source code, the function named informally, directly with the suffix 1,2, completely ignoring the “complete code” in that set of naming practices. Mapaccess2 returns an additional bool. The code for mapAccess2 and mapAccess2 is the same, except that the return value is followed by a false or true.
In addition, depending on the type of key, the compiler will also replace the search, insert, and delete functions with more specific functions to optimize efficiency:
Key type | To find the |
---|---|
uint32 | mapaccess1_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer |
uint32 | mapaccess2_fast32(t *maptype, h *hmap, key uint32) (unsafe.Pointer, bool) |
uint64 | mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer |
uint64 | mapaccess2_fast64(t *maptype, h *hmap, key uint64) (unsafe.Pointer, bool) |
string | mapaccess1_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer |
string | mapaccess2_faststr(t *maptype, h *hmap, ky string) (unsafe.Pointer, bool) |
The parameter types of these functions are specific uint32, UNT64, and string. Because the key type is known in advance, the memory layout is clear. Therefore, many operations are saved and efficiency is improved.
All of the above functions are in the file SRC/Runtime /hashmap_fast.go.
How to Expand capacity
The purpose of using a hash table is to find the target key quickly. However, as more keys are added to the map, the probability of key collisions increases. As the eight cells in the bucket fill up, it becomes less and less efficient to find, insert, and delete keys. Ideally, a bucket would have only one key, which would achieve O(1) efficiency, but this would consume too much space and cost too much time.
Go language uses a bucket to load eight keys. After locating a bucket, it also needs to locate the specific key, which actually uses time to replace space.
Of course, there should be a certain degree to do this, otherwise all the keys fall into the same bucket, which directly degrades into a linked list, and the efficiency of various operations is directly reduced to O(N), which is not feasible.
Therefore, there needs to be a metric to measure the situation described earlier, which is the loading factor. The Go source code defines the load factor as follows:
loadFactor := count / (2^B)
Copy the code
Count is the number of elements in the map, and 2^B is the number of buckets.
When a new key is inserted into a map, the system checks the conditions. If the conditions meet the following two conditions, the system triggers the expansion:
- The load factor exceeds the threshold, which is defined in the source code as 6.5.
- Overflow bucket number is too large: if B is less than 15, that is, the total number of buckets 2^B is less than 2^15, if overflow bucket number is more than 2^B; If B >= 15, the total number of buckets 2^B is greater than or equal to 2^15, and the overflow number is greater than 2^15.
Through assembly language, it can be found that the function in the source code corresponding to the assignment operation is mapassign, and the source code corresponding to the expansion conditions is as follows:
// src/runtime/hashmap.go/mapassign
// The expansion time is triggered
if! h.growing() && (overLoadFactor(int64(h.count), h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
}
// The load factor exceeds 6.5
func overLoadFactor(count int64, B uint8) bool {
return count >= bucketCnt && float32(count) >= loadFactor*float32((uint64(1)<<B))
}
// They overflow too much
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
if B < 16 {
return noverflow >= uint16(1)<<B
}
return noverflow >= 1<<15
}
Copy the code
Explain:
Point 1: We know that each bucket has eight empty slots, and if there is no overflow and all the buckets are full, the load factor works out to be 8. Therefore, when the load factor exceeds 6.5, it indicates that many buckets are about to be filled, and both lookup and insert efficiency are reduced. Expansion is necessary at this time.
Point 2: Is a supplement to point 1. That is to say, when the loading factor is relatively small, the search and insertion efficiency of map at this time is also very low, and point 1 cannot recognize this situation. The surface phenomenon is that the molecules used to calculate the load factor are small, that is, there are fewer elements in the map but more buckets (there are more buckets actually allocated, including a large overflow bucket).
It’s not hard to imagine the reason for this: constantly inserting and deleting elements. Many elements were inserted first, resulting in the creation of many buckets, but the load factor did not reach the threshold at point 1, and expansion was not triggered to alleviate the situation. Then, delete the element, reduce the total number of elements, and insert a lot of elements, creating overflow buckets that don’t violate # 1. What can you do with me? Overflow buckets are so numerous that keys can be scattered and lookup inserts can be horribly inefficient, hence rule 2. It was like an empty town. There were many houses, but few people. They were scattered and it was difficult to find people.
Capacity expansion occurs for both conditions 1 and 2. However, the capacity expansion strategy is not the same, after all, the two conditions respond to different scenarios.
For condition 1, where there are too many elements and too few buckets, it’s simple: Add B by 1, and the maximum number of buckets (2^B) directly doubles the original number of buckets. So there are old and new buckets. Note that the elements are in the old bucket and have not been migrated to the new bucket. Furthermore, the new bucket is only twice as large as the original maximum (2^B * 2).
For condition 2, there are not that many elements, but the overflow bucket number is too high, indicating that many of the buckets are not full. The solution is to create a new bucket space and move elements from the old bucket to the new bucket so that the keys in the same bucket are aligned closer together. This way, the key in overflow Bucket can be moved to the bucket. The result is space savings, higher bucket utilization, and higher map lookup and insert efficiency.
Cao’s blog also offers an extreme solution to Condition 2: If all the key hashes inserted into the map are the same, they will fall into the same bucket. If more than eight key hashes are inserted into the map, overflow buckets will be generated, resulting in too many Overflow buckets. Moving elements doesn’t really solve the problem, because the hash table is now reduced to a linked list, and the operation efficiency becomes O(n).
Let’s see how expansion works. Map capacity expansion requires relocation of existing keys and values to new memory addresses. If a large number of keys and values need to be relocated, performance deteriorates. Therefore, Go Map adopts an “incremental” approach to capacity expansion. The original keys are not moved all at once, but only two buckets are moved at a time.
The hashGrow() function above does not actually “move”, it just allocates the new buckets and attaches the oldbuckets to the oldbuckets field. The action to actually move buckets is in growWork(), and the action to call growWork() is in mapassign and mapDelete. They try to move buckets when inserting, modifying, or deleting keys. Check that the Oldbuckets are moved, and specifically that the Oldbuckets are nil.
Let’s look first at what the hashGrow() function does and then at how the relocation buckets actually works.
func hashGrow(t *maptype, h *hmap) {
// B+1 is equal to 2 times the space
bigger := uint8(1)
// Condition 2
if! overLoadFactor(int64(h.count), h.B) {
// Perform the same amount of memory expansion, so B stays the same
bigger = 0
h.flags |= sameSizeGrow
}
// Hang the old buckets on the bucket
oldbuckets := h.buckets
// Apply for a new bucket space
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger)
flags := h.flags &^ (iterator | oldIterator)
ifh.flags&iterator ! =0 {
flags |= oldIterator
}
// Commit the action for grow
h.B += bigger
h.flags = flags
h.oldbuckets = oldbuckets
h.buckets = newbuckets
// The move progress is 0
h.nevacuate = 0
// overflow buckets 数为 0
h.noverflow = 0
/ /...
}
Copy the code
For example, nevacuate is set to 0, indicating that the current relocation progress is 0.
The lines of the H. F. Lags and not in the original script are not written in the original script.
flags := h.flags &^ (iterator | oldIterator)
ifh.flags&iterator ! =0 {
flags |= oldIterator
}
Copy the code
Here’s the operator: &^. This is called the position-by-zero operator. Such as:
x = 01010011
y = 01010100
z = x &^ y = 00000011
Copy the code
If y bit is 1, then z bit is 0, otherwise z bit is the same as x bit.
Flags flags flags flags flags flags flags flags flags The original h lags behind the “oldIterator” and “iterator” bits in the original script and are not marked with “0”. If the “iterator” bit is not “1”, the “oldIterator” bit is not inserted into the original script so that the “oldIterator” bit becomes “1”. The implication is that BUCKETS is now under the name of oldBuckets, so please switch the corresponding identifier to buckets.
The flag bits are as follows:
// There may be iterators that use buckets
iterator = 1
// There may be iterators that use oldbuckets
oldIterator = 2
// A coroutine is writing a key to the map
hashWriting = 4
// Equal capacity expansion (condition 2)
sameSizeGrow = 8
Copy the code
Take a look at the growWork() function that actually performs the relocation.
func growWork(t *maptype, h *hmap, bucket uintptr) {
// Confirm that the old bucket corresponds to the bucket in use
evacuate(t, h, bucket&h.oldbucketmask())
// Move another bucket to speed up the move process
if h.growing() {
evacuate(t, h, h.nevacuate)
}
}
Copy the code
H.g rowing() is very simple:
func (h *hmap) growing(a) bool {
returnh.oldbuckets ! =nil
}
Copy the code
If the oldbuckets are not empty, they should continue to move.
The bucket&H. oldBucketMask () line, as stated in the source code comment, confirms that the relocated bucket is the bucket we are using. The oldBucketMask () function returns the bucketmask of the map before expansion.
The so-called bucketmask is used to match the computed hash value of the key with the bucketmask. The result is the bucket in which the key should fall. For example, if B = 5, then the bottom five bits of the bucketmask are 11111 and the rest are 0. The hash value corresponds to this meaning that only the bottom five bits of the hash value determine which bucket the key falls into.
Next, we focus all our efforts on the key function evacuate for relocation. The source code is posted below, don’t be nervous, I will add a large area of comments, through the comments is absolutely can understand. I’ll explain the relocation process in more detail later.
The source code is as follows:
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
// Locate the old bucket address
b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
// The result is 2^B, if B = 5, the result is 32
newbit := h.noldbuckets()
// Hash function of key
alg := t.key.alg
// If B has not been moved
if! evacuated(b) {var (
// Indicates the destination address to which the bucket moves
x, y *bmap
// point to key/val in x,y
xi, yi int
// Point to key in x, y
xk, yk unsafe.Pointer
// point to value in x, y
xv, yv unsafe.Pointer
)
// The default capacity expansion is equal to the same size
// Use x to move
x = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
xi = 0
xk = add(unsafe.Pointer(x), dataOffset)
xv = add(xk, bucketCnt*uintptr(t.k eysize)),// If the capacity is not equal to the size, the serial number of the bucket before and after the change
// Use y to move
if! h.sameSizeGrow() {// The bucket number represented by y is increased by 2^B
y = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
yi = 0
yk = add(unsafe.Pointer(y), dataOffset)
yv = add(yk, bucketCnt*uintptr(t.keysize))
}
// Walk through all buckets, including Overflow buckets
// b is the old bucket address
for; b ! =nil; b = b.overflow(t) {
k := add(unsafe.Pointer(b), dataOffset)
v := add(k, bucketCnt*uintptr(t.keysize))
// Traverses all the cells in the bucket
for i := 0; i < bucketCnt; i, k, v = i+1, add(k, uintptr(t.keysize)), add(v, uintptr(t.valuesize)) {
// Top hash value of the current cell
top := b.tophash[i]
// If the cell is empty, there is no key
if top == empty {
// Then mark it as "moved"
b.tophash[i] = evacuatedEmpty
// Proceed to the next cell
continue
}
// This will not happen normally
// The unmoved cell can only be empty or
// Normal top hash (greater than minTopHash)
if top < minTopHash {
throw("bad map state")
}
k2 := k
// If key is a pointer, dereference it
if t.indirectkey {
k2 = *((*unsafe.Pointer)(k2))
}
// X is used by default
useX := true
// If the capacity is not equal
if! h.sameSizeGrow() {// Computes the hash value, the same as when key is first written
hash := alg.hash(k2, uintptr(h.hash0))
// If a coroutine is traversing the map
ifh.flags&iterator ! =0 {
// If the same key is used, the hash value will be different
if! t.reflexivekey && ! alg.equal(k2, k2) {// only in the case of NaN() for float variables
if top&1! =0 {
// position B 1
hash |= newbit
} else {
// position B is 0
hash &^= newbit
}
// Take the high 8 bits as the top hash value
top = uint8(hash >> (sys.PtrSize*8 - 8))
if top < minTopHash {
top += minTopHash
}
}
}
// Depending on whether the oldB+1 bit of the new hash value is 0 or 1
// Read the following article in detail
useX = hash&newbit == 0
}
// If the key is moved to part X
if useX {
// Marks the old cell's top hash value to move to part X
b.tophash[i] = evacuatedX
// If xi is equal to 8, the overflow is about to occur
if xi == bucketCnt {
// Create a bucket
newx := h.newoverflow(t, x)
x = newx
// xi starts at 0
xi = 0
// xk represents the location to which the key is moved
xk = add(unsafe.Pointer(x), dataOffset)
// xv represents the position to which value is to be moved
xv = add(xk, bucketCnt*uintptr(t.keysize))
}
// Set the top hash value
x.tophash[xi] = top
// key is a pointer
if t.indirectkey {
// Copy the old key (which is a pointer) to the new location
*(*unsafe.Pointer)(xk) = k2 // copy pointer
} else {
// Copy the old key (which is the value) to the new location
typedmemmove(t.key, xk, k) // copy value
}
// value is a pointer to key
if t.indirectvalue {
*(*unsafe.Pointer)(xv) = *(*unsafe.Pointer)(v)
} else {
typedmemmove(t.elem, xv, v)
}
// Locate the next cell
xi++
xk = add(xk, uintptr(t.keysize))
xv = add(xv, uintptr(t.valuesize))
} else { // add key to Y
/ /...
// omit this part, the operation is the same as the X part}}}// If no coroutine is using the old buckets, remove the old buckets to help GC
if h.flags&oldIterator == 0 {
b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
// Clear only the key and value parts of the bucket and keep the top hash part to indicate the relocation status
if t.bucket.kind&kindNoPointers == 0 {
memclrHasPointers(add(unsafe.Pointer(b), dataOffset), uintptr(t.bucketsize)-dataOffset)
} else {
memclrNoHeapPointers(add(unsafe.Pointer(b), dataOffset), uintptr(t.bucketsize)-dataOffset)
}
}
}
// Update the relocation progress
// If the bucket of the move is equal to the current progress
if oldbucket == h.nevacuate {
// Progress increases by 1
h.nevacuate = oldbucket + 1
// Experiments suggest that 1024 is overkill by at least an order of magnitude.
// Put it in there as a safeguard anyway, to ensure O(1) behavior.
// Try looking back at 1024 buckets
stop := h.nevacuate + 1024
if stop > newbit {
stop = newbit
}
// Find the bucket that did not move
forh.nevacuate ! = stop && bucketEvacuated(t, h, h.nevacuate) { h.nevacuate++ }// Now all the buckets preceding H. nevacuate are moved
// All buckets have been moved
if h.nevacuate == newbit {
// Remove the old buckets
h.oldbuckets = nil
// Clear old Overflow Bucket
// Recall that [0] represents the current Overflow Bucket
// [1] indicates old Overflow Bucket
ifh.extra ! =nil {
h.extra.overflow[1] = nil
}
// Clear the flag bit that is being expanded
h.flags &^= sameSizeGrow
}
}
}
Copy the code
The code comments of the evacuate function are very clear, and it is easy to understand the entire relocation process based on the code and comments. Be patient.
The purpose of relocation is to move the old buckets to the new one. We know that in condition 1, there are twice as many new buckets as before, and in condition 2, there are the same number of new buckets.
For condition 1, if you move buckets from the old one to the new one, because the number of bucktes remains the same, you can move them according to the number of bucktes. For example, when you move to the new one, you can still put Bucktes at no. 0.
For condition two, it’s not so simple. The hash of the key needs to be recalculated to determine which bucket it falls into. For example, if B = 5, after calculating the hash of the key, we can determine which bucket it falls in by looking at its lower 5 bits. After expansion, B becomes 6, so we need to look at one more bit. The lower six bits of B determine which bucket the key falls into. This is called a rehash.
Therefore, depending on whether the sixth hash bit is 0 or 1, the bucket number of a key may be the same or 2^B (the original B value) before and after the move.
Understanding the bucket sequence number change above allows us to answer another question: Why is the traversal map unordered?
After the map is expanded, keys in the same bucket will be relocated. After the relocation, some keys will be removed (the bucket number is added to 2^B). The traversal process is the sequential traversal of the bucket and the sequential traversal of the keys in the bucket. After the move, the location of keys changed significantly, with some keys flying up high branches and others staying put. As a result, it is impossible to traverse the map in the original order.
Of course, if I only had a hard code map, I would not insert or delete the map. It should return a fixed sequence of keys and values every time I traverse the map. That’s true, but Go puts an end to this practice because it can lead novice programmers to think it’s something that’s going to happen, and in some cases, it can be a big mistake.
Of course, Go goes even further. When traversing a map, we do not always traverse from bucket 0, but from a bucket with a random number and from a cell with a random number of that bucket. This way, even if you are a dead-written map and just iterate over it, it is unlikely that you will return a fixed sequence of key/value pairs.
By the way, the “result of iterating a map is unordered” feature was added since Go 1.0.
Another problem: if B increases by 1, it means that buckets are twice as many as before, and bucket 1 splits into two buckets.
For example, the hash values of two keys in the original B = 2 and bucket 1 are three bits lower than 010,110. Since B = 2, the lower two digits of 10 determine that they fall into bucket 2. Now B becomes 3, so 010 and 110 fall into bucket 2 and bucket 6, respectively.
So with that in mind, we’re going to use that when we do map iterations.
A few more key points in the move function:
The evacuate function completes the relocation of only one bucket at a time. Therefore, all the cells in this bucket are traversed and the cells with values are copied to the new location. Buckets also link to Overflow Buckets, which also need to be relocated. Therefore, there are two layers of loops, the outer layer traversing the bucket and overflow Bucket, and the inner layer traversing all the cells of the bucket. This loop is everywhere in the map source code, to understand.
The source code mentions X, Y part, which is exactly what we said if we double the number of buckets, the first half is called X part, and the second half is called Y part. The key in a bucket may split into two buckets, one in X part and one in Y part. Therefore, before moving a cell, we need to know which Part the key in the cell falls into. It is easy to recalculate the hash of the key in the cell and “look one more” forward to decide which Part to fall into.
There is a special case where there is a key that gets a different result every time you hash it. This key is the result of math.nan (), which means not a number and is of type FLOAT64. When it is used as the map’s key, there is a problem when it is moved: its hash value calculated again is not the same as the hash value calculated when it was inserted into the map!
One consequence of this, as you might expect, is that the key is never retrieved by the Get operation! When I use the m[math.nan ()] statement, I can’t find the result. The key only appears when traversing the map. So, you can insert any number of math.nan () keys into a map.
When a relocation hits a math.nan () key, it is only the lowest part of the tophash that determines whether part X or part Y is allocated (if the number of buckets is doubled). If the lowest order of tophash is 0, allocate to X part; If it’s 1, it’s assigned to Y part.
This is achieved by applying the tophash value to the newly computed hash value:
if top&1! =0 {
// The lowest hash value of top hash is 1
// The newly calculated hash value B position 1
hash |= newbit
} else {
// The newly calculated hash value B is 0
hash &^= newbit
}
// If B bit of the hash value is 0, move to x part
// when B = 5, newbit = 32, the lower 6 bits of binary are 100000
useX = hash&newbit == 0
Copy the code
In fact, I can move this key to either bucket, of course, to the two buckets in the fission diagram above. But it’s good to do this, and I’ll explain it in more detail later when WE talk about map iterations, just for the moment.
Once you have identified the target bucket to move to, the move is easier. Copy the source key/value value to the appropriate location of the destination.
Set key in the tophash of the original buckets to create creating X or atedy, which means that you have moved to x or Y part of a new map. The tophash of the new map normally takes the highest 8 bits of the key hash value.
The following is a macro view of the changes before and after expansion through the figure.
Before capacity expansion, B is 2 and there are four buckets. Lowbits indicates the low hash value. Let’s say we ignore other buckets and focus on bucket number two. And assume that overflow is too large, triggering equal expansion (corresponding to condition 2 above).
After capacity expansion is complete, overflow Buckets disappear and keys are concentrated into one bucket, which is more compact and improves lookup efficiency.
Suppose a double capacity expansion is triggered, and the key in the old buckets is split into two new buckets after capacity expansion. One in x part, one in y part. Depending on the lowbits of the hash. In the new map, 0-3 is called X part, and 4-7 is called Y part.
Note that the above two diagrams ignore the relocation of other buckets and show what happens when all buckets are moved. In fact, we know that relocation is a “gradual” process and will not be completed all at once. Therefore, the oldbuckets pointer points to the old [] Bmap, and the tophash value of the key that has been moved is a status value indicating where the key will be moved.
The map of traverse
The original map traversal process is relatively simple: traversal all the buckets and overflow buckets hanging behind them, and then traversal all the cells in the buckets one by one. Each bucket contains eight cells, and the process is complete by extracting keys and values from the cells with keys.
But the reality is not so simple. Remember the expansion process we talked about earlier? The capacity expansion process is not an atomic operation and only moves two buckets at a time, so if the capacity expansion operation is triggered, the map will be in an intermediate state for a long time: some buckets have been moved to a new home, while others remain in the same place.
Therefore, traversal, if occurring during expansion, involves traversing old and new buckets, which is the difficulty.
I’ll start by writing a simple code sample that pretends not to know what function the traversal calls:
package main
import "fmt"
func main(a) {
ageMp := make(map[string]int)
ageMp["qcrao"] = 18
for name, age := range ageMp {
fmt.Println(name, age)
}
}
Copy the code
Execute command:
go tool compile -S main.go
Copy the code
Get the assembly command. I’m not going to go through it line by line here, but you can look at the previous articles, which are very detailed.
The key lines of assembly code are:
/ /...
0x0124 00292 (test16.go:9) CALL runtime.mapiterinit(SB)
/ /...
0x01fb 00507 (test16.go:9) CALL runtime.mapiternext(SB)
0x0200 00512 (test16.go:9) MOVQ ""..autotmp_4+160(SP), AX
0x0208 00520 (test16.go:9) TESTQ AX, AX
0x020b 00523 (test16.go:9) JNE 302
/ /...
Copy the code
In this way, the underlying function call relationship is clear about the Map iteration. The mapiterinit function is called to initialize the iterator, and the mapiternext function is looped through for map iteration.
Iterator structure definition:
type hiter struct {
/ / key Pointers
key unsafe.Pointer
/ / pointer to a value
value unsafe.Pointer
// Map type, including key size and size
t *maptype
// map header
h *hmap
// The bucket pointed to during initialization
buckets unsafe.Pointer
// The bmap currently traversed
bptr *bmap
overflow [2]*[]*bmap
// Bucet number to start traversal
startBucket uintptr
// Iterate over the number of cells at the beginning (each bucket has 8 cells)
offset uint8
// Have you traversed it from the beginning
wrapped bool
// The size of B
B uint8
// Indicates the current cell number
i uint8
// Point to the current bucket
bucket uintptr
// Bucket that needs to be checked for capacity expansion
checkBucket uintptr
}
Copy the code
Mapiterinit is an initial assignment to a field in a hiter structure.
As mentioned earlier, even traversing a dead-written map produces an out-of-order result each time. Now we can take a closer look at their implementation.
// Generate a random number r
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
r += uintptr(fastrand()) << 31
}
// Which bucket to traverse from
it.startBucket = r & (uintptr(1)<<h.B - 1)
// Which cell of the bucket to start traversing
it.offset = uint8(r >> h.B & (bucketCnt - 1))
Copy the code
For example, the uintptr(1)<< H.b-1 is the uintptr(1)<< H.b-1 and the result is 3, 0000 0011 is the lower 8 bits. BucketCnt -1 = 7, the lower 8 bits is 0000 0111, move R 2 bits to the right, and phase with 7 to get a cell with numbers 0~7.
Thus, the mapiternext function will iterate from the it.startBucket cell with the it.offset number, and extract the key and value from it until it returns to the starting bucket to complete the iterate process.
The source code section is easier to read, especially after you understand the previous comments. So, I’m going to go through the entire traversal process graphically, hopefully in a way that’s easy to understand.
Suppose we have a map like the one shown below, where B = 1 starts with two buckets, and then a capacity expansion is triggered (not to go into the capacity expansion condition here, just a setting), and B becomes 2. In addition, the contents of bucket 1 are moved to the new bucket, and Bucket 1 is split into bucket 1 and bucket 3. Bucket 0 has not been moved. The oldbuckets hang on the *oldbuckets pointer, and the new buckets hang on the *buckets pointer.
At this point, we iterate over the map. Assume that startBucket = 3 and offset = 2 after initialization. Thus, the starting point of the traversal would be cell number 2 of Bucket number 3, and the following diagram is the state in which the traversal begins:
Red indicates the start position, and the sequence of bucket traversal is 3 -> 0 -> 1 -> 2.
Because bucket 3 corresponds to the old bucket 1, check whether the old bucket 1 has been moved. The judgment method is:
func evacuated(b *bmap) bool {
h := b.tophash[0]
return h > empty && h < minTopHash
}
Copy the code
If the value of B. topash [0] is within the range of the flag value, that is, in the interval of (0,4), it indicates that it has been moved.
empty = 0
evacuatedEmpty = 1
evacuatedX = 2
evacuatedY = 3
minTopHash = 4
Copy the code
In this case, the old Bucket number 1 has already been moved. So its topHash [0] value is in the range (0,4), so you just iterate over the new bucket number 3.
Iterating through the cells of bucket number 3 will find the first non-empty key: element E. At this point, the mapiterNext function returns, and our traversal results in a single element:
Since the returned key is not empty, the mapiterNext function continues to be called.
Iterating back from where we left off, we find elements F and G in the new Overflow Bucket 3.
Traversal result sets are thus augmented:
After traversing through the new bucket 3, the new Bucket 0 is returned. Bucket 0 corresponds to the old bucket 0. The old bucket 0 is not moved, so the traversal of the new bucket 0 is changed to traversal of the old bucket 0. Do you want to remove all the keys from bucket 0?
It’s not that simple. Recall that the old bucket 0 will split into two buckets after the move: new bucket 0 and new bucket 2. We are traversing the new bucket 0 (note that this is traversal *bucket Pointers, also known as new buckets). So, we only fetch the keys from the old bucket 0 that were assigned to the new bucket 0 after the fission.
Therefore, lowbits == 00 will enter the traversal result set:
As before, we continue through the new bucket 1 and discover that the old bucket 1 has moved, so we only need to iterate through the existing elements of the new bucket 1. The result set becomes:
To continue iterating through the new bucket 2, it comes from the old bucket 0, so you need the keys in the old bucket 0 that will be fissile into the new bucket 2, that is, the keys with lowbit == 10.
Thus, traversing the result set becomes:
Finally, when you continue traversing to the new bucket number 3, you find that all buckets have been traversed and the entire iteration is complete.
By the way, if you run into a key like math.nan (), the process is similar. The core still depends on which bucket it falls into when it is split. I just look at the lowest order of its top hash. If the lowest hash of the top hash is 0, it is allocated to X part; If it’s 1, it’s assigned to Y part. That’s how you decide whether to pull out the key and put it in the traversal result set.
The core of the map traversal is to understand that the old bucket splits into two new buckets when doubled. The traversal operation is performed according to the sequence number of the new bucket. If the old bucket does not move, the old bucket will find the key to move to the new bucket in the future.
The assignment of the map
As you can see from assembly language, mapAssign is called when a key is inserted or modified into a map.
In fact, the syntax for inserting or modifying keys is the same, except that the key for the former operation does not exist in the map, while the key for the latter operation does exist in the map.
Mapassign has a series of functions that the compiler optimizes for “quick functions” depending on the key type.
Key type | insert |
---|---|
uint32 | mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer |
uint64 | mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer |
string | mapassign_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer |
We’ll just look at the most general assignment function, mapassign.
Overall, the process is pretty simple: compute the hash value of the key, follow the previous process based on the hash value, find the position to assign (either inserting a new key or updating an old key), and assign the corresponding position.
The outer layer traverses the bucket and its Overflow bucket, and the inner layer traverses the cells of the entire bucket. Due to the limitation of space, I will not show the comments of this part of the code. If you are interested, you can go to see them and ensure that you can understand the content of this article.
I’ll mention a few important things about this process.
The function first checks the map flag bit. If the write bit of flags is set to 1, it indicates that other coroutines are “writing”, causing panic. This also shows that map is not secure for coroutines.
Capacity expansion is incremental. If the map is being expanded, ensure that the migration of the old bucket corresponding to the key is completed after the key is located to a bucket. That is, the keys in the old bucket must be migrated to the new bucket (split into two new buckets) before they can be inserted or updated in the new bucket.
The above operation is performed at the front of the function. Only after the move is completed can we safely locate the location of the key in the new bucket and proceed with the subsequent operation.
Now the positioning key should be placed in the position, the so-called find their own position is very important. Prepare two Pointers, one (inserti) to where the hash value of the key is in the TopHash array, and the other (inserTK) to where the cell is (where the key ends up). The index position in the TopHash array determines the position of the key in the bucket (total of eight keys), and the value position “spans” the length of eight keys.
During the loop, inserti and inserTK point to the first free cell found, respectively. If no key is found in the map, the map does not contain the key, which means that a new key is inserted. The final location of the key is the “tophash” that was first discovered.
Inserti and INSERTK are empty. Insert overflow Bucket. Insert overflow Bucket. Of course, it is also possible to hang an Overflow Bucket after an Overflow Bucket. This means that too many keys hash into this bucket.
Before placing a key, check the map status to see if it needs to be expanded. If the expansion conditions are met, an expansion operation is triggered.
After that, the whole process of finding the location key has to go through again. Because after capacity expansion, the distribution of keys has changed.
Finally, the map-related values are updated. If a new key is inserted, the count value of the map element field is increased by 1. The hashWriting flag set at the beginning of the function is cleared.
In addition, there is an important point to be made. Finding the location of the key and performing the assignment is not accurate. The mapAssign prototype does not pass in a value, so when does the assignment take place?
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
Copy the code
The answer lies in assembly language. I’ll give you the answer directly, if you’re interested, you can study it privately. The mapAssign function returns a pointer to the corresponding value of the key. With an address, you can easily assign values.
The deletion of the map
The underlying write execution function is mapdelete:
func mapdelete(t *maptype, h *hmap, key unsafe.Pointer)
Copy the code
Depending on the key type, the delete operation is optimized to a more specific function:
Key type | delete |
---|---|
uint32 | mapdelete_fast32(t *maptype, h *hmap, key uint32) |
uint64 | mapdelete_fast64(t *maptype, h *hmap, key uint64) |
string | mapdelete_faststr(t *maptype, h *hmap, ky string) |
Of course, we only care about the mapDelete function. The “H lags” tags are first checked and not written into the script. If the script is found to be “1”, the script will panic, because it signifies that some other coprogram was written at the same time.
Compute the hash of the key to find the falling bucket. Check the map. If the map is being expanded, a relocation operation is triggered.
Delete operation is also a two-layer loop, the core is still to find the specific location of the key. The search process is similar, looking for each cell in a bucket.
After finding the corresponding position, “clear” the key or value:
// clear the key
if t.indirectkey {
*(*unsafe.Pointer)(k) = nil
} else {
typedmemclr(t.key, k)
}
// Clear value to zero
if t.indirectvalue {
*(*unsafe.Pointer)(v) = nil
} else {
typedmemclr(t.elem, v)
}
Copy the code
Finally, subtract the count value by 1 and set the tophash value at the corresponding position to Empty.
This source code is also relatively simple, the rise of direct to see the code.
The map into order
Can I delete it while I’m walking through it
Map is not a thread-safe data structure. Reading and writing a map at the same time is an undefined behavior. If it is detected, it will panic.
In general, this can be solved by reading and writing locks: sync.rwmutex.
Call RLock() before reading and RUnlock() after reading. Call Lock() before writing, and Unlock() after writing.
In addition, sync.Map is a thread-safe Map and can also be used. I’m not going to talk about how it works.
Can the key be a float?
Grammatically, yes. In Go, any type that is comparable can be used as a key. All types are OK except slice, map, and functions. These include: booleans, numbers, strings, Pointers, channels, interface types, structs, and arrays containing only the above types. A common feature of these types is support for == and! If k1 == k2, k1 and k2 can be considered as the same key. In the case of structures, they all need to have the same field value to be considered the same key.
By the way, any type can be a value, including the map type.
Here’s an example:
func main(a) {
m := make(map[float64]int)
m[1.4] = 1
m[2.4] = 2
m[math.NaN()] = 3
m[math.NaN()] = 3
for k, v := range m {
fmt.Printf("[%v, %d] ", k, v)
}
fmt.Printf("\nk: %v, v: %d\n", math.NaN(), m[math.NaN()])
fmt.Printf("k: %v, v: %d\n".2.400000000001, m[2.400000000001])
fmt.Printf("k: %v, v: %d\n".2.4000000000000000000000001, m[2.4000000000000000000000001])
fmt.Println(math.NaN() == math.NaN())
}
Copy the code
Output from the program:
[2.4, 2] [NaN, 3] [NaN, 3] [1.4, 1]
k: NaN, v: 0
k: 2.400000000001, v: 0
k: 2.4, v: 2
false
Copy the code
The example defines a map with keys of type float and inserts four keys into it: 1.4, 2.4, NAN, NAN.
Four keys are also printed when printing, if you know NAN! = NAN, is not surprising. Since they do not compare equally, they are, naturally, two different keys from the map’s point of view.
Then, we query for several key, found NAN doesn’t exist, 2.400000000001 does not exist, and the 2.4000000000000000000000001 is exist.
Kind of weird, isn’t it?
Then, through compilation, I discovered the following facts:
When float64 is used as the key, it is converted to unit64 and then inserted into the key.
The Float64frombits function does this:
// Float64frombits returns the floating point number corresponding
// the IEEE 754 binary representation b.
func Float64frombits(b uint64) float64 { return* (*float64)(unsafe.Pointer(&b)) }
Copy the code
That is, to represent floating point numbers in the format specified by IEEE 754. Such as assignment statement:
0x00bd 00189 (test18.go:9) LEAQ "".statictmp_0(SB), DX
0x00c4 00196 (test18.go:9) MOVQ DX, 16(SP)
0x00c9 00201 (test18.go:9) PCDATA $0, $2
0x00c9 00201 (test18.go:9) CALL runtime.mapassign(SB)
Copy the code
“”. Statictmp_0 (SB) variable looks like this:
"".statictmp_0 SRODATA size=8
0x0000 33 33 33 33 33 33 03 40
"".statictmp_1 SRODATA size=8
0x0000 ff 3b 33 33 33 33 03 40
"".statictmp_2 SRODATA size=8
0x0000 33 33 33 33 33 33 03 40
Copy the code
Let’s output something more:
package main
import (
"fmt"
"math"
)
func main(a) {
m := make(map[float64]int)
m[2.4] = 2
fmt.Println(math.Float64bits(2.4))
fmt.Println(math.Float64bits(2.400000000001))
fmt.Println(math.Float64bits(2.4000000000000000000000001))}Copy the code
4612586738352864255
4612586738352862003
4612586738352862003
Copy the code
Converted to hexadecimal:
0x4003333333333333
0x4003333333333BFF
0x4003333333333333
Copy the code
Compare that to the previous “”. Statictmp_0. 2.4 and 2.4000000000000000000000001 after math.h Float64bits () function transformation after the result is the same. Naturally, they are the same key from map’s point of view.
Look at NAN (not a number) :
// NaN returns an IEEE 754 ``not-a-number'' value.
func NaN(a) float64 { return Float64frombits(uvnan) }
Copy the code
Uvan is defined as:
uvnan = 0x7FF8000000000001
Copy the code
NAN() calls Float64frombits directly, passing in a dead const 0x7FF8000000000001 to get a NAN value. Since NAN is parsed from a constant, why is the map being treated as a different key when inserted?
This is determined by the hash function of the type. For example, for 64-bit floating-point numbers, its hash function is as follows:
func f64hash(p unsafe.Pointer, h uintptr) uintptr {
f := *(*float64)(p)
switch {
case f == 0:
return c1 * (c0 ^ h) / / + 0, 0
casef ! = f:return c1 * (c0 ^ h ^ uintptr(fastrand())) // any kind of NaN
default:
return memhash(p, h, 8)}}Copy the code
The second case, F! = f is for NAN, and there’s a random number there.
In this way, all the puzzles were solved.
Due to the nature of NAN:
NAN ! = NAN hash(NAN) ! = hash(NAN)Copy the code
So if you look in the map for a NAN key, you find nothing; If you add four Nans to it, the loop will yield four Nans.
Float can be used as a key, but it can cause some weird problems because of its accuracy.
conclusion
At the time of writing this article, there were some questions that I could not find answers to in the chinese-language blogs. Of course, the source code can answer any question. However, you can not jump into the details of the source code, you have to have a general understanding of the good.
So, I started to search English related source code articles, there are not too many of them. However, I found a very good article, listed in the first reference, which led readers step by step to optimize, and finally achieved the random extraction of a key from the map. I recommend you to read it. It’s wonderful. This is especially true if you know the detailed process of map traversal and expansion.
To sum up, in Go language, map is realized through hash lookup table, and hash conflicts are solved by linked list method.
The hash of the key is used to scatter the key into different buckets with eight cells in each bucket. The low hash value determines the bucket number, and the high hash value identifies different keys in the same bucket.
When too many keys are added to a bucket, resulting in too many elements, or too many buckets overflow, capacity expansion is triggered. Capacity expansion includes equal capacity expansion and double capacity expansion. After capacity expansion, keys in one bucket are split into two buckets and are redistributed to two buckets.
The capacity expansion process is gradual to prevent performance problems caused by excessive key relocation during capacity expansion. Capacity expansion is triggered when new elements are added, and bucket relocation occurs during assignment and deletion, with a maximum of two buckets being moved at a time.
One of the core aspects of search, assignment, and delete is how to locate the key. Once understood, the source code for map is ready to read.
Finally, if this article is helpful to you, please help me to share it, or click on it, thank you!
Finally, click to read the original article and you may witness a thousand stars project from scratch.
The resources
The English how to realize the random pick up a map key, wonderful 】 lukechampine.com/hackmap.htm…
The map of wikipedia 】 【 en.wikipedia.org/wiki/Associ…
【sync.map source code analysis 】github.com/Chasiny/Blo…
Various map related operation flowchart 】 【 www.jianshu.com/p/aa0d4808c…
The map source code analysis 】 【 www.twblogs.net/a/5bd78d5d2…
Cao da’s article on MAP needs no explanation github.com/cch123/gola…
English has a figure 】 【 www.ardanlabs.com/blog/2013/1…
【 English compared with Java, c + + map implementation 】 dave.cheney.net/2018/05/29/…
Why go map. English is sensitive to compete 】 【 dave.cheney.net/2015/12/07/…
【 Golang Blog Map 】blog.golang.org/go-maps-in-…
【 randomMap open source 】github.com/lukechampin…
hacpai.com/article/153…
【 night reading issue】github.com/developer-l…
Draveness. Me/Golang-Hash…
My.oschina.net/renhc/blog/ expansion process + figure 】 【…
【 operator 】juejin.cn/post/684490…
[English] www.digitalocean.com/community/t…
Gocn. VIP /article/170…
The essay, at the same time of traverse, delete key can be 】 cloud.tencent.com/developer/a…
Programming for Faith, Golang Range draP. Me/Golang-for-…
The difference between slice and map as a parameter 】 【 stackoverflow.com/questions/4…
[Go official blog about map] blog.golang.org/go-maps-in-…
【Go language comparable type 】golang.org/ref/spec#Co…
【 Key type 】lanlingzi.cn/post/techni…
【 hash function performance comparison 】aras-p.info/blog/2016/0…
The hash function selection, c + + / Java contrast 】 studygolang.com/articles/15…
Slice and map as a function parameter stackoverflow.com/questions/4 】…
[Fried Fish Big blog map1] github.com/EDDYCJY/blo…
[Fried Fish Boss blog map2] github.com/EDDYCJY/blo…
The definition of a hash function 】 【 zhangshuai. Ren / 2018/05/16 /…
How to compare the equality of two maps golangbot.com/maps/
NAN hash research.swtch.com/randhash 】
Concurrent security on 】 【 zjykzk. Making. IO/post/cs/gol…