preface

The most can impress the interviewer in a job interview is actually the details of the candidate to the understanding of the details determine the impression to the interviewer is the “foundation” or “weak”, if the candidate can extrapolate active their understanding of some technical details and the summary, it is undoubtedly a major bright spot in the process of the interview. A HashMap is a seemingly simple data structure with a lot of technical details in it, and there are plenty of technical details worth digging into in a high-end interview without asking any questions about red-black trees (which were introduced in Java 8 to handle hash collisions in extreme cases).

Put the book read thin

In Java 7, the implementation of HashMap was over 1000 lines, but in Java 8 it grew to over 2000 lines. Although the number of lines of code was small, there were a lot of bit operations and other details in the code that made it look complicated and difficult to understand. But if we step out of the way, the HashMap data structure is very basic, and we start with a picture in our head like this:

This diagram captures the most basic points of a HashMap:

  1. JavaIn theHashMapThe basic data structure of the implementation of thekey->valueIs composed of key value pairsEntityClasses are stored in this array as a bidirectional linked list
  2. The position of the element in the array is determined bykey.hashCode()The value of is determined if twokeyThe hash values of are equal, that is, a hash collision occurs, then these twokeyThe correspondingEntityWill be stored in an array as a linked list
  3. callHashMap.get()Will be calculated firstkeyIs then found in the arraykeyAnd then walk through the linked list at that position to find the corresponding value.

Of course there are two things that are missing from this picture:

  1. In order to elevate the entireHashMapThe read efficiency whenHashMapIs equal to the size of the bucket array times the load factorHashMapWe’re going to have to expand to reduce hash collisions, which we’ll talk about later in the code
  2. inJava 8If the number of linked lists in the same position of the bucket array exceeds a fixed value, the whole linked list has a certain probability to turn into a red-black tree.

On the whole, there are four most important points in the whole HashMap: initialization, data address-hash method, data storage-PUT method, and expansion -resize method. As long as you understand the principle and call timing of these four points, you will understand the design of the whole HashMap.

Put the book read thick

With an understanding of the overall architecture of HashMap, we can try to answer some of the following questions, and if we are confused about any of them, we need to dig deeper into the code and read more.

  1. HashMapThe inside of thebucketWhy is array length always an integer power of 2
  2. HashMapThe defaultbucketHow big is the array
  3. HashMapWhen to open upbucketArray memory
  4. HashMapWhen will the capacity be expanded?
  5. When does the list of elements in a bucket become a red-black tree, when does it go back to the list, and why?
  6. Java 8Why do you want to introduce red black tree, in order to solve the problem of what scene?
  7. HashMapHow to deal withkeyfornullKey value pairs of?

new HashMap()

In JDK 8, when new HashMap() is called, no array heap memory is allocated, just some parameter verification and some constants are initialized

`public HashMap(int initialCapacity, float loadFactor) {` `if (initialCapacity < 0)` `throw new IllegalArgumentException("Illegal initial capacity: " +` `initialCapacity); ` `if (initialCapacity > MAXIMUM_CAPACITY)` `initialCapacity = MAXIMUM_CAPACITY; ` `if (loadFactor <= 0 || Float.isNaN(loadFactor))` `throw new IllegalArgumentException("Illegal load factor: " +` `loadFactor); ` `this.loadFactor = loadFactor; ` `this.threshold = tableSizeFor(initialCapacity); ` `}` `static final int tableSizeFor(int cap) {` `int n = cap - 1; ` `n |= n >>> 1; ` `n |= n >>> 2; ` `n |= n >>> 4; ` `n |= n >>> 8; ` `n |= n >>> 16; ` `return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; ` ` `}Copy the code

TableSizeFor is used to find the smallest integer power of 2 greater than cap, and we assume that n is 000001xxxxxx, where x is 0 or 1.

n |= n >>> 1; The value of n after execution is:

You can see that the top two bits of n are now 1 (1 and 0 or 1 or both), and then execute the second line:

Visible n binary four has become the highest 1, until the execution of the code n | = n > > > 16; After that, the lowest binary value of n is all 1, that is, n = 2^x – 1 where x depends on the value of n, and if not MAXIMUM_CAPACITY is exceeded, a positive integer power of 2 is returned. So tableSizeFor ensures that a minimum positive integer power of 2 greater than the input parameter is returned.

The code initialized in JDK 7 is similar to the inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable

'// For the first time put, Private void inflateTable(int toSize) {' // Find an power of 2 >= toSize 'int capacity = roundUpToPowerOf2(toSize); ` `threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); ` `table = new Entry(capacity); ` `initHashSeedAsNeeded(capacity); ` ` `}Copy the code

Here we also answer the question posed at the beginning:

When does a HashMap open a bucket array to occupy memory? The answer is that when HashMap is first put, this is how it is implemented in both Java 8 and Java 7. Here we can see that the size of the bucket array is a positive integer power of 2 in both versions of the implementation. You will see why this is the case later.

hash

In the special data structure of HashMap, the hash function is responsible for addressing and addressing, and its performance has a huge impact on the performance of the whole HashMap. What is a good hash function?

  • The calculated hash value is sufficiently hashed to effectively reduce hash collisions
  • It can be computed very quickly becauseHashMapEach callgetandput“Is calledhashmethods

Here is the implementation in Java 8:

`static final int hash(Object key) {` `int h; ` `return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); ` ` `}Copy the code

(h = key.hashCode()) ^ (h >>> 16); (h >>> 16);

We know that the hash function is used to determine the position of the key in the bucket array. In the JDK, for better performance, it is usually written like this:

index =(table.length - 1) & key.hash();
Copy the code

Table. Length is a positive integer power of 2, similar to 000100000. This value is reduced by one to 000011111, which is efficient in bitwise addressing. Why is the bucket array length inside the HashMap always an integer power of 2? One of the benefits is that you can address quickly by constructing bitwise operations.

Returning to the topic of this section, since all the hash values calculated need to be combined with table. length-1, it means that only the low hash values calculated are valid, which will increase the probability of collision. Therefore, the high 16 bits and the low 16 bits are allowed to do xOR, and the low 16 bits retain part of the high information to reduce hash collisions.

Let’s look at the implementation of Hash in Java 7:

`final int hash(Object k) {` `int h = hashSeed; ` `if (0 ! = h && k instanceof String) {` `return sun.misc.Hashing.stringHash32((String) k); ` `}` `h ^= k.hashCode(); ` `// This function ensures that hashCodes that differ only by` `// constant multiples at each bit position have a bounded` `// number of collisions (approximately 8 at default load factor).` `h ^= (h >>> 20) ^ (h >>> 12); ` `return h ^ (h >>> 7) ^ (h >>> 4); ` ` `}Copy the code

In Java 7, in order to avoid the loss of high hash information, more complex xOR operations are performed, but the basic starting point is the same: the low hash value retains part of the high hash information to reduce hash collisions.

put

The idea behind the put method in Java 8 is as follows:

  1. callkeythehashCodeMethod computes the hash value from which the array index is calculated
  2. If the current bucket array is found asnullThe callresize()Method to initialize
  3. If no hash collision occurs, it is placed directly into the corresponding bucket
  4. If a hash collision occurs and the node already exists, replace the correspondingvalue
  5. If a hash collision occurs and a tree structure is stored in the bucket, mount it to the tree
  6. If it is a linked list after collision, add it to the end of the linked list, if the linked list overpassesTREEIFY_THRESHOLDThe default is 8, which converts the list to a tree structure
  7. dataputAfter that, ifHashMapMore thanthresholdWill beresize

The specific code and comments are as follows:

'public V put(K key, V value) {' // Call hash method' 'return putVal(hash(key), key, value, false, true); ` `}` `final V putVal(int hash, K key, V value, boolean onlyIfAbsent,` `boolean evict) {` `Node<K,V>[] tab; Node<K,V> p; int n, i; ` ` if ((TAB = table) = = null | | (n = TAB. Length) = = 0) ` ` / / put for the first time, will be called the resize for bucket array initialization ` ` n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) &hash]) == null) // If (p = TAB [I = (n-1) &hash]) ' 'TAB [I] = newNode(hash, key, value, null); ` `else {` `Node<K,V> e; K k; ` `if (p.hash == hash &&` `((k = p.key) == key || (key ! = null && key.equals(k)))) ' '// hash collision, and node already exists, directly replace' 'e = p; Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); For (int binCount = 0; ; ++binCount) {` `if ((e = p.next) == null) {` `p.next = newNode(hash, key, value, null); If (binCount >= treeify_threshold-1) // -1 for 1st treeifyBin(TAB, hash); ` `break; ` `}` `if (e.hash == hash &&` `((k = e.key) == key || (key ! = null && key.equals(k)))) ' '// If the node already exists, break the loop; // Otherwise, the pointer moves back, continuing the loop ' 'p = e; ` `}` `}` `if (e ! = null) {// existing mapping for key ' '// branch' ' 'V oldValue = e.value; ` `if (! onlyIfAbsent || oldValue == null)` `e.value = value; ` `afterNodeAccess(e); ` `return oldValue; ` `}` `}` `++modCount; 'if (++size > threshold)' // If the threshold is exceeded, resize() needs to be expanded; ` `afterNodeInsertion(evict); ` `return null; ` ` `}Copy the code

The PUT method in Java 7 is much simpler by comparison

'public V put(K key, V value) {' // If key is null, Call putForNullKey if (key == null) return putForNullKey(value); ` `int hash = hash(key.hashCode()); ` `int i = indexFor(hash, table.length); ` `for (Entry<K, V> e = table[i]; e ! = null; e = e.next) {` `Object k; ` `if (e.hash == hash && ((k = e.key) == key` `|| key.equals(k))) {` `V oldValue = e.value; ` `e.value = value; ` `e.recordAccess(this); ` `return oldValue; ` `}` `}` `modCount++; ` `addEntry(hash, key, value, i); ` `return null; ` `}` `void addEntry(int hash, K key, V value, int bucketIndex) {` `Entry<K, V> e = table[bucketIndex]; // ① 'table[bucketIndex] = new Entry<K, V>(hash, key, value, e); ` `if (size++ >= threshold)` `resize(2 * table.length); / / (2) ` ` `}Copy the code

There is a small detail here. HashMap allows key-value pairs with putKey null, but such key-value pairs are placed in the 0th bucket of the bucket array.

resize()

Resize is the most complex module in the whole HashMap. If the value of the put data exceeds the threshold, it needs to be expanded, which means that the size of the bucket array changes. As we have analyzed in the previous article, HashMap addressing is done by index =(table.length-1) & key.hash(); Table. Length (); table. Length (); table.

Here involves the bucket array length for positive integer power of 2 second advantage: when the bucket array length is positive integer power of 2, if the bucket capacity (double the length), the bucket is only about half of the elements in need to switch to the new barrels, the other half remains in the original barrel, and the probability is seen as equal.

If the binary value of key.hash() on the bit to be expanded is 0, the address in the bucket remains unchanged after expansion. Otherwise, the highest bit after expansion becomes 1, and the new address can be calculated quickly.

Here is the implementation in Java 8:

`final Node<K,V>[] resize() {` `Node<K,V>[] oldTab = table; ` `int oldCap = (oldTab == null) ? 0 : oldTab.length; ` `int oldThr = threshold; ` `int newCap, newThr = 0; If (oldCap >= MAXIMUM_CAPACITY) {' 'threshold = integer.max_value; ` `return oldTab; ' '} ' '// Does not exceed the maximum value, Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // Double threshold ' '} 'else if (oldThr > 0) // Initial capacity was placed in threshold NewCap = oldThr; newCap = oldThr; ` `else { // zero initial threshold signifies using defaults` `newCap = DEFAULT_INITIAL_CAPACITY; ` `newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); If (newThr == 0) {float ft = (float)newCap * loadFactor; // If (newThr == 0) {float ft = (float)newCap * loadFactor; ` `newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? ` `(int)ft : Integer.MAX_VALUE); ` `}` `threshold = newThr; ` `@SuppressWarnings({"rawtypes","unchecked"})` `Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; ` `table = newTab; ` `if (oldTab ! = null) {` `for (int j = 0; j < oldCap; ++j) {' 'Node<K,V> e; ` `if ((e = oldTab[j]) ! = null) {` `oldTab[j] = null; NewTab [e.hash & (newCap-1)] = e; if (e.ext == null) newTab[e.hash & (newcap-1)] = e; ` `else if (e instanceof TreeNode)` `((TreeNode<K,V>)e).split(this, newTab, j, oldCap); ' 'else {// preserve order' '// Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; ` `Node<K,V> next; ` `do {` `next = e.next; ` `if ((e.hash & oldCap) == 0) {` `if (loTail == null)` `loHead = e; ` `else` `loTail.next = e; ` `loTail = e; ` `}` `else {` `if (hiTail == null)` `hiHead = e; ` `else` `hiTail.next = e; ` `hiTail = e; ` `}` `} while ((e = next) ! = null); ` `if (loTail ! = null) {` `loTail.next = null; NewTab [j] = loHead; newTab[j] = loHead; ` `}` `if (hiTail ! = null) {` `hiTail.next = null; // New bucket position = old bucket position + oldCap, newTab[j + oldCap] = hiHead; ` `}` `}` `}` `}` `}` `return newTab; ` ` `}Copy the code

The resize method in Java 7 is much simpler:

  1. After the basic verificationnewA new bucket array of the size of the specified input parameter
  2. The elements in the bucket are placed in new positions based on the new bucket array length
`void resize(int newCapacity) {` `Entry[] oldTable = table; ` `int oldCapacity = oldTable.length; ` `if (oldCapacity == MAXIMUM_CAPACITY) {` `threshold = Integer.MAX_VALUE; ` `return; ` `}` `Entry[] newTable = new Entry[newCapacity]; ` `boolean oldAltHashing = useAltHashing; ` `useAltHashing |= sun.misc.VM.isBooted() &&` `(newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD); ` `boolean rehash = oldAltHashing ^ useAltHashing; ` `transfer(newTable, rehash); ` `table = newTable; ` `threshold = (int) Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); ` `}` `void transfer(Entry[] newTable, boolean rehash) {` `int newCapacity = newTable.length; ' 'for (Entry<K, V> e: table) {'' // the linked list with the table[I] break traversal, head traversal inserted into the newTable ' 'while (null! = e) {` `Entry<K, V> next = e.next; ` `if (rehash) {` `e.hash = null == e.key ? 0 : hash(e.key); ` `}` `int i = indexFor(e.hash, newCapacity); ` `e.next = newTable[i]; ` `newTable[i] = e; ` `e = next; ` `} ` `} ` ` `}Copy the code

conclusion

After looking at the implementation of HashMap in Java 8 and Java 7, let’s answer some of the questions raised above:

  1. Why is the bucket array length inside the HashMap always an integer power of 2

    A: This has two advantages. First, it can be quickly addressed with bits such as (table.length-1) & key.hash(). Second, when HashMap is expanded, it can ensure that elements in the same bucket are evenly hashed into new buckets. To be specific, elements in the same bucket are generally kept in the original bucket after expansion, and are generally put into a new bucket.

  2. The default size of the bucket array for the HashMap

    A: The default is 16, and even if the specified size is not an integer power of 2, the HashMap will find the nearest integer power of 2 to initialize the bucket array.

  3. When does a HashMap open a bucket array to occupy memory

    Answer: Call the resize method on the first put

  4. When will HashMap be expanded?

    A: When element proficiency in a HashMap exceeds the threshold, the threshold is calculated as capacity * loadFactor, which in a HashMap is 0.75

  5. When does the list of elements in a bucket become a red-black tree, when does it go back to the list, and why?

    A: When the number of elements in the same bucket is greater than or equal to 8, the linked list of elements in the bucket is converted to a red-black tree. Conversely, when the number of elements in the bucket is less than or equal to 6, the linked list is converted to a red-black tree. The reason for this is to avoid frequent conversion between the red-black tree and the linked list, resulting in performance loss

  6. Why were red-black trees introduced in Java 8, and what scenarios were they designed to solve?

    A: The red-black tree is introduced to avoid scenarios in which the hash performance deteriorates dramatically and the read and write performance of the HashMap deteriorates dramatically. Under normal circumstances, the red-black tree is generally not used. In some extreme scenarios, if the client implements a hashCode method with poor performance, The read/write complexity of a HashMap is guaranteed to be no less than O(lgN)

    `public int hashCode() {` `return 1; ` ` `}Copy the code
  7. How does HashMap handle key-value pairs with null keys?

    A: Placed in a bucket with subscript 0 in the bucket array

8. Bitterness code