The last interview asked a series of questions about HashMap, such as whether the underlying implementation was thread-safe. There was also talk of expansion. Unable to answer, I hereby study record.

The underlying implementation

In JDK 1.8 it is arrays + linked lists + red-black trees, while in JDK 1.7 there is no red-black tree

  • Arrays hold references to data

    The source code is to implement the array. Changed from Entry to Node in version 1.8.

    Version 1.7 Version 1.8
    transient Entry[] table transient Node<K, V>[] table

    The index of the array is the hashcode of the Key.

  • Linked lists and red-black trees are designed to resolve hashCode conflicts

    • When the list length is greater than 8 and the array size is greater than 64, the list structure is converted to a red-black tree structure.

        static final int TREEIFY_THRESHOLD = 8;
        static final int MIN_TREEIFY_CAPACITY = 64;
      Copy the code

      When the array size is less than 64, only expansion occurs and it does not turn into a red-black tree

    • When the length is less than or equal to 6, the red-black tree structure is converted to a linked list structure.

        static final int UNTREEIFY_THRESHOLD = 6;
      Copy the code
    • With linked lists, why are there red black trees?

      Is the optimization of the query, linked list search time complexity is O(n). And red – black trees have the characteristics of rapid addition, deletion, check and change. This can solve the problem of slow operation when the linked list is too long.

In the transition between the list and the red-black tree, we talked about array size, list length, and expansion. Is a HashMap expansion similar to an array expansion?

Expansion mechanism

If the capacity needs to be expanded, the original capacity is no longer sufficient. So how much capacity does HashMap initialize? First, a HashMap can specify the size of the initialization capacity, and then the next part is whether the initialization is customized or not.

Initial capacity of HashMap

  • When the capacity is not specified, the initial capacity is 16, the source code is as follows.

    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

  • JDK 1.8 and JDK 1.7 also set the timing differently when specifying the capacity.

    In JDK 1.7, the capacity is not set until the first PUT operation. In JDK 1.8, when the constructor is called, the capacity is set.

So why is it possible to customize an initial capacity when there is a default one? In addition, the Alibaba Java development manual also recommends that the collection initialization size be determined when the collection is initialized.

The main reason is that when the initial capacity is far less than the capacity required by the data, the HashMap needs to be continuously expanded. The expansion mechanism of the HashMap determines that the hash table needs to be rebuilt each time the capacity is expanded, which seriously affects the performance.

So if you specify the initialization size, what is appropriate?

When the initial capacity is specified, there is an expansion of the input value, which is raised to the nearest power of 2.

Specific reference blog.csdn.net/moakun/arti… (I will delete it if I can’t quote it)

There is a reference to the loader factor, which defaults to 0.75.

Static final float DEFAULT_LOAD_FACTOR = 0.75f;Copy the code

This load factor is also the determinant of the threshold in the HashMap expansion mechanism. The blogs I see now are based on a balance between space utilization and time efficiency with 0.75. There is also an explanation from Linn_cn’s article published on March 18, 2020.03.18, with a link attached at the bottom.

When the capacity expansion factor is set relatively large, the threshold of capacity expansion becomes high and the frequency of capacity expansion becomes low. However, at this time, the probability of Hash conflicts will increase. When there are too many conflicting elements, it will increase the search cost to change into linked lists or red-black trees. If the expansion factor is too small, capacity expansion is frequently triggered, occupying large space, for example, recalculating Hash, resulting in high operation performance.Copy the code

The threshold value is load factor x maximum capacity. When the number of elements exceeds the threshold, expansion is triggered. Expansion has two steps

1. Resize (): creates a new empty array twice as long. 2. Rehash (): iterates over the original array to rehash all nodes into the new array.Copy the code

Advantages and disadvantages of HashMap

  • Advantages:

    1. Query: time complexity of O(1)
    2. Dynamic storage
  • Disadvantages:

    1. During capacity expansion, the hash table needs to be rebuilt, which affects performance.
    2. Not thread-safe.

The HashMap thread is not safe

There is no locking mechanism in the implementation of HashMap, so it is not thread-safe per se.

  • The solution

    1. There is another data structure: ConcurrentHashMap. As far as I can see, there are few application scenarios mentioned in most articles on the Internet.
    2. Replace it with HashTable. However, HashTable is mostly deprecated and has low performance.
    3. Use the synchronizedMap method of the Collections class.

Put method

  1. Check if there is any space left in the array

    • A: 2.
    • Expansion and no:
  2. Evaluates the Hashcode based on the Key to determine the array position

  3. Check for content in the array location

    • Compare key and hashcode values (equals, ==)

      • Same: replace the original content

      • Not the same as:

        • Whether the array position corresponds to a red-black tree or a linked list

          • Red black tree: 4.
          • Linked list: Loop to next for null nodes or find nodes with the same hash and key

          After putting it in, check if the list length has reached the conversion value with the red-black tree structure, 8, and compare the array length with 64.

    • No: 4.

  4. Put in key-value pairs

The Get method

  1. Check whether the array is null or not

    • Is that the return null
    • No: 2.
  2. Determine if the header is the target value and compare the hashcode and key values

    • If yes, return header node
    • No: 3.
  3. Check whether the structure is a red-black tree

    • Is: getTreeNode
    • If no, it is a linked list and can be traversed

Reference: juejin. Cn/post / 684490…