preface

The most can impress the interviewer in a job interview is actually the details of the candidate to the understanding of the details determine the impression to the interviewer is the “foundation” or “weak”, if the candidate can extrapolate active their understanding of some technical details and the summary, it is undoubtedly a major bright spot in the process of the interview. A HashMap is a seemingly simple data structure with a lot of technical details in it, and there are plenty of technical details worth digging into in a high-end interview without asking any questions about red-black trees (which were introduced in Java 8 to handle hash collisions in extreme cases).

In Java 7, the implementation of HashMap was over 1000 lines, but in Java 8 it grew to over 2000 lines. Although the number of lines of code was small, there were a lot of bit operations and other details in the code that made it look complicated and difficult to understand. But if we step out of the way, the HashMap data structure is very basic, and we start with a picture in our head like this:

The most basic points in a HashMap:

1. The basic data structure of the implementation of HashMap in Java is an array. Each key->value pair forms an Entity class and stores it in this array in the form of a bidirectional linked list

2. The position of the element in the array is determined by the value of key.hashcode (). If the hashes of two keys are equal, a hash collision has occurred, and the Entity corresponding to the two keys will be stored in the array as a linked list

3. When we call hashmap.get (), we calculate the value of the key, find the corresponding position in the array, and then traverse the list for the corresponding value.

Two things are missing:

1. In order to improve the reading efficiency of the entire HashMap, when the size of the elements stored in the HashMap is equal to the size of the bucket array multiplied by the load factor, the entire HashMap should be expanded to reduce hash collisions. Details will be described in the code later

2. In Java 8, if the number of linked lists in the same position of the bucket array exceeds a fixed value, the whole list has a certain probability to turn into a red-black tree.

On the whole, there are four most important points in the whole HashMap: initialization, data address-hash method, data storage-PUT method, and expansion -resize method. As long as you understand the principle and call timing of these four points, you will understand the design of the whole HashMap.

With an understanding of the overall architecture of HashMap, we can try to answer some of the following questions, and if we are confused about any of them, we need to dig deeper into the code and read more.

1. Why is the bucket array length inside the HashMap always an integer power of 2

2. How big is the default bucket array for the HashMap

3. When does HashMap open a bucket array to occupy memory

4. When will HashMap be expanded?

5. When does the bucket’s element list turn into a red-black tree, when does it turn back to the list, and why?

6. Why are red-black trees introduced in Java 8? What scenarios are they designed to solve?

7. How does HashMap handle key-value pairs with null keys?

new HashMap()

In JDK 8, when new HashMap() is called, no array heap memory is allocated, just some parameter verification and some constants are initialized

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                            initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                            loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

TableSizeFor is used to find the smallest integer power of 2 greater than cap, and we assume that n is 000001xxxxxx, where x is 0 or 1.

n |= n >>> 1; The value of n after execution is:

You can see that the top two bits of n are now 1 (1 and 0 or 1 or both), and then execute the second line:

Visible n binary four has become the highest 1, until the execution of the code n | = n > > > 16; After that, the lowest binary value of n is all 1, that is, n = 2^x – 1 where x depends on the value of n, and if not MAXIMUM_CAPACITY is exceeded, a positive integer power of 2 is returned. So tableSizeFor ensures that a minimum positive integer power of 2 greater than the input parameter is returned.

The code initialized in JDK 7 is similar to the inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable inflateTable

// For the first time put, Table private void inflateTable(int toSize) {// Find an power of 2 >= toSize int capacity = roundUpToPowerOf2(toSize); threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); table = new Entry(capacity); initHashSeedAsNeeded(capacity); }Copy the code

Here we also answer the question posed at the beginning:

When does a HashMap open a bucket array to occupy memory? The answer is that when HashMap is first put, this is how it is implemented in both Java 8 and Java 7. Here we can see that the size of the bucket array is a positive integer power of 2 in both versions of the implementation. You will see why this is the case later.

hash

In the special data structure of HashMap, the hash function is responsible for addressing and addressing, and its performance has a huge impact on the performance of the whole HashMap. What is a good hash function?

The calculated hash value is sufficiently hashed to effectively reduce hash collisions

It can be computed quickly because a HashMap calls the hash method every time it calls GET and put

Here is the implementation in Java 8:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

(h = key.hashCode()) ^ (h >>> 16); (h >>> 16);

We know that the hash function is used to determine the position of the key in the bucket array. In the JDK, for better performance, it is usually written like this:

index =(table.length - 1) & key.hash();
Copy the code

Table. Length is a positive integer power of 2, similar to 000100000. This value is reduced by one to 000011111, which is efficient in bitwise addressing. Why is the bucket array length inside the HashMap always an integer power of 2? One of the benefits is that you can address quickly by constructing bitwise operations.

Returning to the topic of this section, since all the hash values calculated need to be combined with table. length-1, it means that only the low hash values calculated are valid, which will increase the probability of collision. Therefore, the high 16 bits and the low 16 bits are allowed to do xOR, and the low 16 bits retain part of the high information to reduce hash collisions.

Let’s look at the implementation of Hash in Java 7:

final int hash(Object k) { int h = hashSeed; if (0 ! = h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } h ^= k.hashCode(); // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded //  number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }Copy the code

In Java 7, in order to avoid the loss of high hash information, more complex xOR operations are performed, but the basic starting point is the same: the low hash value retains part of the high hash information to reduce hash collisions.

put

The idea behind the put method in Java 8 is as follows:

1. Call the hashCode method of the key to calculate the hash value and then calculate the array index

2. If the current bucket array is null, call the resize() method to initialize it

3. If no hash collision occurs, the bucket is directly added to the bucket

4. If a hash collision occurs and the node already exists, replace the corresponding value

5. If a hash collision occurs and the bucket stores a tree structure, mount the bucket to the tree

6. If it is a list after the collision, add it to the end of the list. If the list overpasses the TREEIFY_THRESHOLD, which is 8 by default, convert the list to a tree structure

7. After data is put, if the total number of hashMaps exceeds threshold, resize is required

The specific code and comments are as follows:

Public V put(K key, V value) {// Call hash method return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; If ((TAB = table) = = null | | (n = TAB. Length) = = 0) / / put for the first time, will be called the resize for bucket array initialization n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) &hash]) == null) // If (p = TAB [I = (n-1) &hash]) == null) TAB [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) // hash collision, and node already exists, directly replace e = p; Else if (p instanceof TreeNode) // Tree structure e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); Else {// hash collision, linked list for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); If (binCount >= treeify_threshold-1) // -1 for 1st treeifyBin(TAB, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) // If the node already exists, break the loop; P = e; p = e; } } if (e ! = null) {// existing mapping for key // branch out of the loop // replace V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) // If the threshold is exceeded, resize() needs to be expanded. afterNodeInsertion(evict); return null; }Copy the code

The PUT method in Java 7 is much simpler by comparison

Public V put(K key, V value) {putForNullKey if (key == NULL) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K, V> e = table[i]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; } void addEntry(int hash, K key, V value, int bucketIndex) { Entry<K, V> e = table[bucketIndex]; // ① table[bucketIndex] = new Entry<K, V>(hash, key, value, e); if (size++ >= threshold) resize(2 * table.length); / / (2)}Copy the code

There is a small detail here. HashMap allows key-value pairs with putKey null, but such key-value pairs are placed in the 0th bucket of the bucket array.

resize()

Resize is the most complex module in the whole HashMap. If the value of the put data exceeds the threshold, it needs to be expanded, which means that the size of the bucket array changes. As we have analyzed in the previous article, HashMap addressing is done by index =(table.length-1) & key.hash(); Table. Length (); table. Length (); table.

Here involves the bucket array length for positive integer power of 2 second advantage: when the bucket array length is positive integer power of 2, if the bucket capacity (double the length), the bucket is only about half of the elements in need to switch to the new barrels, the other half remains in the original barrel, and the probability is seen as equal.

If the binary value of key.hash() on the bit to be expanded is 0, the address in the bucket remains unchanged after expansion. Otherwise, the highest bit after expansion becomes 1, and the new address can be calculated quickly.

Here is the implementation in Java 8:

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; } // Do not exceed the maximum value, Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold} else if (oldThr > 0) // Initial capacity was placed in threshold NewCap = oldThr; newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) {float ft = (float)newCap * loadFactor;} if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) { for (int j = 0; j < oldCap; ++j) {// Iterate over the previous bucket array and re-hash its value Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; NewTab [e.hash & (newcap-1)] = e; newTab[e.hash & (newcap-1)] = e; newTab[e.hash & (newcap-1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// preserve order // If there is a list Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; NewTab [j] = loHead; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; // New bucket position = old bucket position + oldCap, newTab[j] = hiHead; } } } } } return newTab; }Copy the code

The resize method in Java 7 is much simpler:

1. After basic verification, create a new bucket array of the specified size

2. The elements in the bucket are placed in new positions based on the length of the new bucket array

void resize(int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } Entry[] newTable = new Entry[newCapacity]; boolean oldAltHashing = useAltHashing; useAltHashing |= sun.misc.VM.isBooted() && (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD); boolean rehash = oldAltHashing ^ useAltHashing; transfer(newTable, rehash); table = newTable; threshold = (int) Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); } void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; For (Entry<K, V> e: table) {// The linked list is iterated with the table[I], the head is iterated back and inserted into the newTable while (null! = e) { Entry<K, V> next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

conclusion

After looking at the implementation of HashMap in Java 8 and Java 7, let’s answer some of the questions raised above:

1. Why is the bucket array length inside the HashMap always an integer power of 2?

A: This has two advantages. First, it can be quickly addressed with bits such as (table.length-1) & key.hash(). Second, when HashMap is expanded, it can ensure that elements in the same bucket are evenly hashed into new buckets. To be specific, elements in the same bucket are generally kept in the original bucket after expansion, and are generally put into a new bucket.

2. What is the default bucket array size for a HashMap?

A: The default is 16, and even if the specified size is not an integer power of 2, the HashMap will find the nearest integer power of 2 to initialize the bucket array.

3. When does HashMap open a bucket array to occupy memory?

Answer: Call the resize method on the first put

4. When will HashMap be expanded?

A: When element proficiency in a HashMap exceeds the threshold, the threshold is calculated as capacity * loadFactor, which in a HashMap is 0.75

5. When does the bucket’s element list turn into a red-black tree, when does it turn back to the list, and why?

A: When the number of elements in the same bucket is greater than or equal to 8, the linked list of elements in the bucket is converted to a red-black tree. Conversely, when the number of elements in the bucket is less than or equal to 6, the linked list is converted to a red-black tree. The reason for this is to avoid frequent conversion between the red-black tree and the linked list, resulting in performance loss

6. Why are red-black trees introduced in Java 8? What scenarios are they designed to solve?

A: The red-black tree is introduced to avoid scenarios in which the hash performance deteriorates dramatically and the read and write performance of the HashMap deteriorates dramatically. Under normal circumstances, the red-black tree is generally not used. In some extreme scenarios, if the client implements a hashCode method with poor performance, The read/write complexity of a HashMap is guaranteed to be no less than O(lgN)

public int hashCode() { return 1; }

7. How does HashMap handle key-value pairs with null keys?

A: Placed in a bucket with subscript 0 in the bucket array

The last

I here collated a HashMap related information documents, Spring series of family, Java systematic information (including Java core knowledge points, interview topics and the latest Internet real questions in 20 years, e-books, etc.) friends who need can pay attention to the public number [procedures Yuan small wan] can obtain.