Warning from HashMap: This is article 56 of the Java Programmer progression column. The other day, xiao Er came to me with his head down and complained to me, “Lao Wang, a student Xiao Mo asked me about ‘the expansion mechanism of HashMap’. I mumbled for a long time and didn’t explain it to him clearly. At the end of the talk, MY heart was broken and I almost cried!”

I comforted the boy for a long time before his excitement calmed down. I told him that the HashMap expansion mechanism was inherently difficult to understand, especially with the addition of red-black trees in JDK8. We’re going to start with JDK7, and then we’re going to add the red-black tree part to make a lot of sense.

Small 2 this just suddenly understand, admire the place nodded.

If you have a GitHub account, please remember to arrange a wave of stars. The open source tutorial “Java Programmer’s Path to Progress” has 244 stars on GitHub, ready to hit 1000, please.

GitHub address: github.com/itwanger/to…

Read online address: itwanger. Gitee. IO/tobebetterj…


You know that arrays cannot change their size once they are initialized, so arrayLists are dynamic arrays that can automatically expand.

The underlying HashMap also uses arrays. You keep adding elements to a HashMap, and when the array can’t hold any more elements, you need to expand the array to fit more elements.

Of course, arrays cannot be automatically expanded, so to do so, you need to create a larger array and copy the elements of the smaller array.

JDK 8 contains a red-black tree, which is used to create a HashMap. JDK 7 contains a red-black tree, which is used to create HashMap.

Resize method source code:

// newCapacity indicates the newCapacity
void resize(int newCapacity) {
    // Small array, temporary over
    Entry[] oldTable = table;
    // Capacity before capacity expansion
    int oldCapacity = oldTable.length;
    // MAXIMUM_CAPACITY is the maximum capacity, 2 ^ 30 = 1<<30
    if (oldCapacity == MAXIMUM_CAPACITY) {
        // Adjust the capacity to Integer's maximum value 0x7FFFFFFf (hexadecimal) =2 to the 31th power -1
        threshold = Integer.MAX_VALUE;
        return;
    }

    // Initialize a new array.
    Entry[] newTable = new Entry[newCapacity];
    // Move elements from a small array to a large array
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    // Reference the new large array
    table = newTable;
    // Recalculate the threshold
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
Copy the code

The left shift (<<) appears in the code comment, here is a brief introduction:

a=39
b = a << 2
Copy the code

Decimal 39, expressed in 8-bit binary, is 00100111, moved two bits to the left is 10011100 (supplemented by zeros), and converted to decimal is 156.

Shift operations can often be used instead of multiplication and division. For example, moving 0010011 (39) two Spaces to the left gives 10011100 (156), making it exactly four times as large.

In fact, the binary number doubles, quadruples, and quadruples as you move it to the left.

The transfer method is used to transfer the elements of a small array into a new array.

void transfer(Entry[] newTable, boolean rehash) {
    // New capacity
    int newCapacity = newTable.length;
    // Iterate over the small array
    for (Entry<K,V> e : table) {
        while(null! = e) {// Different values on the same key
            Entry<K,V> next = e.next;
            // Whether to recalculate the hash
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            // Calculate the index of the element in the array based on the size of the array and the hash of the key
            int i = indexFor(e.hash, newCapacity);

            // New elements in the same position are placed at the head of the list
            e.next = newTable[i];

            // Put it on the new array
            newTable[i] = e;

            // The next element on the liste = next; }}}Copy the code

E.next = newTable[I], that is, the use of single linked list header insertion method, the same position of the new element will always be placed at the head of the list; Unlike JDK 8, elements placed first on an index will end up at the end of the list (if a hash conflict occurs).

Elements on the same list in the old array may be recalculated into different positions in the new array (look closely below to explain this).

Suppose the hash algorithm (as discussed in the previous section, click on the link to revisit it) is simply a hash of the key (an int value) and modulo of the array size (i.e. HashCode % table.length).

Continuing with the hypothesis:

  • Array table has a length of 2
  • The key hashes are 3, 7, 5

After modulo operation, all hashing conflicts are on table[1] because the remainder is 1. Then the appearance before expansion is shown in the figure below.

The size of the small array is 2, and keys 3, 7, and 5 are all on the linked list of table[1].

Assume that the loadFactor is 1, that is, the table is expanded when the actual size of an element is greater than the actual size of the table.

The enlarged large array has a capacity of 4.

  • Key 3 is moulded (3%4) and then 3 is placed on table[3].
  • After key 7 is moulded (7%4), 3 is placed at the head of the linked list on table[3].
  • Key 5 is 1 after taking mold (5%4) and put on table[1].

As we expected, the expanded 7 would still be at the end of the 3 list, but in reality? Seven goes to the top of three. What optimizations have been made in JDK 8 to address this situation in JDK 7?

Look at the graph below.

N is the length of the table. The default value is 16.

  • N -1 = binary 0000 1111 (1X 202^020+1X 212^121+1X 222^222+1X 232^323=1+2+4+8=15);
  • The last eight bits of the key1 hash are 0000 0101
  • The last 8 bits of the key2 hash are 0001 0101 (different from key1)
  • A hash conflict occurred after the and operation, and the indexes are all on (0000 0101).

After capacity expansion, the value is 32.

  • N -1 is the binary 0001 1111 (1X 202^020+1X 212^121+1X 222^222+1X 232^323+1X 242^424=1+2+4+8+16=31). Before expansion, it is 0000 1111.
  • The low value of the key1 hash is 0000 0101
  • The low level of the key2 hash is 0001 0101 (different from key1)
  • Key1 is indexed to 0000 0101 after the and operation.
  • The key2 index is 0001 0101 after the and operation.

The new index changes like this:

  • The original index was 5 (00101).
  • The original capacity was 16
  • The capacity after expansion is 32
  • The expanded index is 21 (10101), which is 5+16, which is the original index + the original capacity

In other words, JDK 8 does not need to recompute the hash as JDK 7 does. It only needs to see if the new bit is 1 or 0, which means the index is the same, or 1, which means the index is “old + old”.

In JDK 8, this design is very clever, which not only saves the time of recalcuating the hash, but also, because the new bit is random, the expansion process can evenly distribute the previous nodes to the new location.

Woc, let’s just say HashMap authors Doug Lea, Josh Bloch, Arthur Van Hoff, Neal Gafter really did a great job.

JDK 8 expansion source code:

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If it exceeds the maximum value, it will no longer be expanded
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // If the value is not higher than the maximum value, the value is doubled
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Calculate the new resize upper limit
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        // Copy the small array to the large array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    // List optimizes code blocks for rehash
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // The original index
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Index + original capacity
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

Reference links:

zhuanlan.zhihu.com/p/21673805


Java programmer advanced path, humorous, easy to understand, extremely friendly and comfortable for Java beginners 😘, including but not limited to Java syntax, Java collection framework, Java IO, Java concurrent programming, Java virtual machine and other core knowledge points.

Github.com/itwanger/to…