The data structure
The go Map data structure looks something like this
There are two main structures, HMAP and BMAP
hmap
An HMAP is a structure that represents a map.
type hmap struct {
count int
flags uint8
B uint8
noverflow uint16
hash0 uint32
buckets unsafe.Pointer
oldbuckets unsafe.Pointer
nevacuate uintptr
extra *mapextra
}
Copy the code
Analyze the effects of these variables separately
- Count Number of elements
- Flags Indicates the status, such as being written, traversed, etc
- The logarithm of the number of buckets in B, so the number of buckets is 2 to the B
- Hash0 hash seed, increasing the randomness of the hash function
- Buckets point to a pointer to a BMAP array
- Oldbuckets are Pointers to old Bmap arrays, and are related to capacity expansion, as described below
- Nevacute indicates the expansion progress
bmap
A bmap is a bucket, and the HMAP buckets refer to an array of Bmaps.
type bmap struct {
tophash [bucketCnt]uint8
}
Copy the code
It may seem rare, but a new structure is actually created dynamically during compilation
type bmap struct {
topbits [8]uint8
keys [8]keytype
values [8]valuetype
pad uintptr
overflow uintptr
}
Copy the code
Explain these variables
- Topbits is the tophash, which takes the higher eight bits of the hash value
- Keys, values are all arrays of length 8, which means you can put up to 8 kV in a bucket
- Pad alignment fill
- Overflow bucket can be understood as a pointer to a new bucket, that is, the linked list method
You can see that the keys are put together, and the values are put together, just to save space
positioning
Look at the picture
Assuming that the B of hMAP is 5, the last five bits determine which BMAP bucket to put in.
In the insert case, the bucket is traversed, the first space is found and inserted, setting the tophash value to the first eight bits of the hash value.
In the case of a lookup, the bucket is traversed and the first eight bits of the hash value are compared to the tophash of each cell. If it is not found in the overflow bucket, and overflow is not empty, the function continues to search for overflow bucket until it is found or all cells are searched, and the function returns a pointer to h[key]. That will return a zero value of the key type, not nil.
For search, after obtaining the hash value, we only use the B bit to locate the bMAP bucket, and then we need to do an inner loop to find the corresponding kV, or if not, we need to overflow the bucket to find it.
As opposed to a Java Hashmap, this eight-length array is the equivalent of a linked list, which is stored in a Bmap when hash collisions occur.
capacity
When the number of elements increases, the number of collisions will increase, and the value in bMAP will be more, and the search efficiency will be lower. Therefore, it is necessary to expand the size to a certain extent.
One indicator is needed, namely the load factor.
The load factor rule for GO is
loadFactor := count / (2^B)
Copy the code
That is, the number of elements divided by the number of buckets. The default is 6.5.
Java’s Hashmap has a default load factor of 0.75, so Go is more likely to trade space for time than Java.
Expansion can be triggered in two cases:
- The load factor exceeds the default value
- The number of overflow buckets is too large
The first, which is easy to understand, is expansion caused by too many normal elements
In the second case, because of the continuous addition and deletion of elements, many buckets overflow and few elements do not meet the default value of the load factor, but the efficiency is very low
In these two cases, the expansion strategy is not different, I explain each
Load factor exceeds default value
In this case, the simplest expansion is to double the size of the bucket by adding 1 to B. Instead of transferring buckets directly, Go uses incremental scaling similar to redis, which explains the oldbucket role in HMAP.
After capacity expansion, olbucket will be pointed to the array first, and growwork() method will be called every time growwork() is inserted, modified or deleted to try to migrate. After the migration, the sequence number will remain the same.
The number of overflow buckets is too large
This situation is similar to Java’s HashMap expansion, which involves changing the position of the bucket on a linked list to reduce the length of the list and improve efficiency.
B+1 to double the capacity of the bucket. Determine the position of the bucket by determining whether the added bit is 1.
Concurrency issues
Map is not a thread-safe data structure. Reading and writing a map at the same time is an undefined behavior that, if detected, causes a direct panic.