Go Map is a common data structure. The underlying map is generally based on the open addressing method and the separated linking method. When the former is in conflict, rehash is usually performed; when the latter is in conflict, elements are added to the linked list.

The map read and write data of the separated linking method will look for the same key value in the corresponding bucket linked list, and can be returned or updated if it is found. Could not find an element to add to the end or head of the list when updating. If bucket lists are too long, they can also be converted to trees to provide performance. For example, a Java8 map becomes a red-black tree when the list number is greater than 8. This is not the case with the Go Map implementation.

Next, the implementation principles of GO Map and GO Map are analyzed first, and then the differences, advantages and disadvantages of the two implementations are compared.

1 go map

Go initializes the map using make. The underlying call to the makemap function returns a pointer of type:

// file: runtime/hashmap.go 
func makemap(t *maptype, hint64, h *hmap, bucket unsafe.Pointer) *hmap
Copy the code

Note that go map is not thread-safe. If you need a thread-safe map, use sync.map. When a map reads and writes concurrently, panic is reported. This is because the hasWriting flag is checked during reading. If it is set to indicate that there is a write operation, an exception will be thrown.

Note that go does not traverse the map elements in a fixed order, because the map randomly selects a starting position during the traverse. Why does GO do this? Since the Go Map is incremental in rehash, the data may be at the old or the new buckets when it is read, so it is simply iterated from a random location in order to avoid misunderstanding to the user.

1.1 Map Data structure

type hmap struct {
    count     int    // Number of elements in the current map
    flags     uint8
    B         uint8   // The size of the bucket to the power of 2, for example, B=8 means that the bucket size is 2^8=256
    noverflow uint16
    hash0     uint32  // Hash seed, initialized when creating the map

    buckets    unsafe.Pointer
    oldbuckets unsafe.Pointer   // Buckets are pointed to the previous buckets field
    nevacuate  uintptr // The number of buckets that have been moved to a new map during capacity expansion

    extra *mapextra
}

type mapextra struct {
    overflow    *[]*bmap
    oldoverflow *[]*bmap
    nextOverflow *bmap
}
Copy the code

Each bucket in an Hmap is a BMAP, and each Bmap stores eight key/value pairs. When a bucket stores more than eight data pairs, the memory space corresponding to map.mapextra.overfolow is used. CreateOverflow creates a new space if memory is low. Bmap data structure is as follows:

type bmap struct {
    tophash [bucketCnt]uint8
}
Copy the code

The definition of Bmap in GO is shown above, but the runtime is not just the definition above. Why is this? Because go Map supports different types of keys/values, and Bmap stores a fixed 8 keys/values, the amount of memory occupied by Bmap needs to be inferred during compilation. During the runtime, other fields of BMAP are read and written by memory pointer operations (which is one of the reasons for the bad readability of go Map source code). The bMAP runtime structure is as follows:

type bmap struct {
    topbits  [8]uint8
    keys     [8]keytype
    values   [8]valuetype
    pad      uintptr
    overflow uintptr
}
Copy the code

Why is the Topbits field defined in bmap? This is a quick way to determine if hashes are equal, but if they are not equal, they are not equal.

1.2 Read and Write Process

Map read and write methods refer to mapAccess1 and mapassign. Mapaccess1 and mapAccess2 correspond to whether map[key] returns OK.

1.2.1 Write Operations

When the map[key] assignment operation is performed, the runtime.mapassign operation is converted to runtime.mapassign, which computes the hash value and sets the map write flag hashWriting. It is used to quickly check the topbits field. If the top hash is equal and the key is equal, the map is updated. If no element is found after traversal (including traversal of overflow array), inserti writes the element to the free (possibly deleted) memory recorded during the first traversal. If the traversal is complete and there is no free space, the newoverflow operation will be performed. The core code is as follows.

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    hash := t.hasher(key, uintptr(h.hash0)) // Computes the hash value
    // Set hashWriting after calling t.hasher, since t.hasher may panic,
    // in which case we have not actually done a write.
    h.flags ^= hashWriting
again:
    // Hash maps a bucket
    bucket := hash & bucketMask(h.B)
    if h.growing() {
        growWork(t, h, bucket)
    }
    // Force bmap to compute the top hash
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    top := tophash(hash)

    var inserti *uint8
    var insertk unsafe.Pointer
    var elem unsafe.Pointer
bucketloop:
    for {
        for i := uintptr(0); i < bucketCnt; i++ {
            ifb.tophash[i] ! = top {// inserti is the free memory space to be inserted
                if isEmpty(b.tophash[i]) && inserti == nil {
                    inserti = &b.tophash[i]
                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
                    elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                }
                if b.tophash[i] == emptyRest { // No data exists after data ends
                    break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if! t.key.equal(key, k) {continue // unequal record traversal
            }
            // already have a mapping for key. Update it.
            // Data has been updated
            if t.needkeyupdate() {
                typedmemmove(t.key, k, key)
            }
            elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            goto done
        }
        ovf := b.overflow(t) // Walk through the next Overflow array
        if ovf == nil {
            break
        }
        b = ovf
    }

    if inserti == nil {
        // No free space is found
        newb := h.newoverflow(t, b)
        inserti = &newb.tophash[0]
        insertk = add(unsafe.Pointer(newb), dataOffset)
        elem = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    typedmemmove(t.key, insertk, key)
    *inserti = top
    h.count++
done:
    h.flags &^= hashWriting
    return elem
}
Copy the code

1.2.2 Read Operation

When the value:=map[key] operation is performed, go converts it to runtime. mapAccess1. The hash bucket search and traversal operations are the same as the map write operations.

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    ifmsanenabled && h ! =nil {
        msanread(key, t.key.size)
    }
    ifh.flags&hashWriting ! =0 {
        // Cannot read and write at the same time
        throw("concurrent map read and map write")
    }
    hash := t.hasher(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
bucketloop:
    / / traverse the overflow
    for; b ! =nil; b = b.overflow(t) {
        // Iterate over a single Bmap structure for up to 8 data
        for i := uintptr(0); i < bucketCnt; i++ {
            // Determine the top hash. The top hash indicates the data with the highest 8 bits
            ifb.tophash[i] ! = top {if b.tophash[i] == emptyRest {
                        break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            // k is equal
            if t.key.equal(key, k) {
                e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                if t.indirectelem() {
                    e = *((*unsafe.Pointer)(e))
                }
                return e
            }
        }
    }
    // Return its 0 value not found
    return unsafe.Pointer(&zeroVal[0])}Copy the code

Note that the mapAccess2 implementation is almost the same as mapAccess1, except that more returns true than false exist.

The structure of map is bucket array + map.key/value array/linked list. If it is a delete operation, the physical deletion will not be actually carried out, but the deletion flag will be set, and it will be reused in the next update operation.

As the map data increases, performance slows down, rehash is triggered when the load factor exceeds 6.5 or too many overflow overflow buckets are used, but since the Go language hash expansion is not an atomic process, Therefore, runtime.mapassign also needs to check whether the current hash has been expanded to avoid confusion caused by secondary expansion.

If the capacity expansion is triggered by too many overflow buckets, the map will expand the capacity by the same amount. When data is continuously written and deleted, if it does not reach the threshold, memory overflow will be slow (logical deletion, and continued writing does not reuse the deleted space). Therefore, sameSizeGrow is introduced to solve this problem by reusing the existing hash expansion mechanism. Once too many overflowing buckets appear in the hash, it creates new buckets to hold data, and garbage collection cleans up the old overflowing buckets and frees up memory.

2 java map

Java Map is HashMap. Jdk1.8 optimizes the underlying implementation of HashMap, such as the introduction of red-black tree data structure and capacity expansion optimization. The underlying implementation of HashMap is based on array + linked list. The array is equivalent to the bucket array of go Map, and the linked list is equivalent to the BMAP linked list of Go Map (a Bmap stores 8 key/value pairs).

A HashMap is also thread-unsafe and can be used to ensure thread-safety by using ConcurrentHashMap, which has bucket granularity for concurrent locks, whereas the Go thread-safe Sync.map has only one lock and is relatively low in performance.

When a HashMap writes data, it also maps the key hash to a bucket, then iterates over the data, creating nodes to add to the list if none exists, or updating the data if it already exists. The process for reading data is similar, except that there is no data that is returned as null in the Map. Note that the base type in the Java Map is its wrapper type key/value.

Unlike Go Map, which is incremental, Java rehash is done in one go.

3 Compare go Map with Java Map

There are many comparisons between the go Map and Java Map implementations, as shown below:

  • Basic implementation: all are based on hash table separated link method implementation, but go Map is array + BMAP linked list (Bmap itself contains 8 key/value pairs)
  • Load factor: Go Map 6.5, Java Map 0.75, essentially the two are similar, because go Bmap can store 8 KV pairs (a single Bmap can be considered as a chain of nodes), 6.5/8=0.8, similar to 0.75.
  • Capacity expansion mechanism: Go Map is an incremental Rehash, and some data may be moved during each set operation, while Java Map is completed at one time. This implementation performance of Go is relatively better, but the performance may not be obvious because the rehash is not frequently developed in daily life.
  • Memory usage: Go Map applies for dimensions at least bMAP size at one time, while Java Map can apply for dimensions with a single key/value. Go Map data deletes are logical deletes that do not free up memory in a timely manner, but may be reused soon, whereas Java is available on demand.
  • Code readability: Java Map is better at this point. Go has a lot of pointer manipulation that looks like a nod.
  • Performance comparison: Because the underlying implementation principles of the two systems are similar, there is no difference in read and write performance. However, there may be some differences.

In daily business development, general performance problems do not occur in language or framework, but mostly in IO operations and business complexity processing. We should pay more attention to whether business modules are cohesive and boundaries are clearly divided, rather than performance differences of framework components (unless there are exponential differences and high QPS).

Here is an example of go Map and Java HashMap to see the write performance of both (note that the larger map is initialized in advance to avoid rehash effects on test results) :

// go
func main(a) {
   N := 1000000
   result := int64(0)

   for i := 0; i < 5; i++ {
      m1 := make(map[int]int, N)
      t1 := time.Now()
      for j := 0; j < N; j++ {
         m1[j] = j
      }
      cost := time.Now().Sub(t1).Milliseconds()
      fmt.Printf("%v: put map duration: %d ms\n", i+1, cost)

      result += cost
   }
   fmt.Printf("put map duration: %d ms\n", result/5)}// Output the result
1: put map duration: 91 ms
2: put map duration: 104 ms
3: put map duration: 94 ms
4: put map duration: 85 ms
5: put map duration: 87 ms
put map duration: 92 ms

Copy the code
// java
public static void main(String[] args) {
    int N = 1000000;
    long result = 0;

    for (int i = 0; i < 5; i++) {
        HashMap<Integer, Integer> map = new HashMap<>(N);
        long start = System.currentTimeMillis();
        for (int j = 0; j < N; j++) {
            map.put(j, j);
        }
        long cost = System.currentTimeMillis() - start;
        System.out.printf("%d: put map duration: %d ms\n", i, cost);

        result += cost;
    }

    System.out.printf("put map duration: %d ms\n", result / 5);
}

// Output the result
0: put map duration: 83 ms
1: put map duration: 82 ms
2: put map duration: 19 ms
3: put map duration: 39 ms
4: put map duration: 18 ms
put map duration: 48 ms
Copy the code

Note that due to environmental factors, the data will be slightly different each time, which is in line with expectations.

According to the test results, Java map performance is better than Go Map. Why? In Java, the loading operation should be slower. Does it have something to do with the excessive pointer operation in the test environment or GO? I don’t know

References:

  1. Draveness. Me/golang/docs…