One, foreword

Compared to previous JDK versions, this one, 1.8, has been greatly optimized for hashMap. One of the most important optimizations is that the elements in the bucket can be stored in a red-black tree instead of a linked list. In short, the goal is to make the bucket faster and improve performance while being safe and functional.

HashMap data structure

Three, HashMap source code analysis

3.1 Inheritance relationship of classes

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable
Copy the code

As you can see, HashMap inherits from a parent class (AbstractMap) and implements the Map, Cloneable, Serializable interfaces. The Map interface defines a common set of operations. Cloneable interface indicates copying. In HashMap, shallow copying is implemented, that is, changes to copied objects affect the copied objects. The Serializable interface represents the implementation of HashMap serialization, that is, you can save a HashMap object locally and then restore its state.

3.2 Class attributes

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {// private static Final Long serialVersionUID = 362498820763181265L; Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; // The default fill factor is static finalfloatDEFAULT_LOAD_FACTOR = 0.75 f; Static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; Static final int MIN_TREEIFY_CAPACITY = 64; static final int MIN_TREEIFY_CAPACITY = 64; // Array of storage elements, always power of 2 TRANSIENT Node<k,v>[] table; // Transient Set<map.entry<k,v>> entrySet; // The number of elements to store. Note that this does not equal the length of the array. transient int size; // Map count transient int modCount; // Threshold When the actual size (capacity x filling factor) exceeds the threshold, the system expands the capacity. Int threshold; // Fill factor finalfloat loadFactor;
}
Copy the code

Constructor of class 3.3

1.HashMap(int,float) constructor

public HashMap(int initialCapacity, floatLoadFactor) {// The initial capacity cannot be less than 0, otherwise an error is reportedif (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: "+ initialCapacity); // The initial capacity cannot be greater than the maximum capacityif(initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; // The fill factor cannot be less than or equal to 0 and cannot be a non-numberif (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: "+ loadFactor); LoadFactor = loadFactor; // Initialize the padding factor this.loadFactor = loadFactor; This. threshold = tableSizeFor(initialCapacity); }Copy the code

Note: tableSizeFor(initialCapacity) Returns the smallest quadratic value greater than or equal to initialCapacity.

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

Description: The >>> operator represents an unsigned right shift, with the high level set to 0.

2.HashMap(int) constructor

Public HashMap(int initialCapacity) {// Call HashMap(int,float) constructor this(initialCapacity, DEFAULT_LOAD_FACTOR); }Copy the code

3.HashMap() constructor

public HashMap() {// initialize the fill factor this.loadFactor = DEFAULT_LOAD_FACTOR; }Copy the code

4.HashMap(Map) constructor

public HashMap(Map<? extends K, ? Extends V> m) {this.loadFactor = DEFAULT_LOAD_FACTOR; PutMapEntries (m,false);
}
Copy the code

Description: The putMapEntries(Map M, Boolean evict) function stores all elements of M into a local HashMap example.

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if(s > 0) {// Check whether the table is initializedif(table == null) {// pre-size // uninitialized, s is the actual number of elements in mfloat ft = ((float)s/loadFactor int t = ((ft < (float)MAXIMUM_CAPACITY) ? (int)ft : MAXIMUM_CAPACITY); // If t is greater than the threshold, initialize the thresholdif(t > threshold) threshold = tableSizeFor(t); } // The system has been initialized and the number of M elements exceeds the thresholdelse if(s > threshold) resize(); // Add all elements in m to the HashMapfor (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict); }}}Copy the code

3.4 Analysis of important functions

1. PutVal function

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // The table is not initialized or has a length of 0if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // (n - 1) & hashDetermines which bucket the element is in, which bucket is empty, and which node is in the bucket (in this case, the node is in an array).if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null); // Elements already exist in the bucketelse{ Node<K,V> e; K k; // Compare the value of the first element in the bucket (node in the array)hashThe values are equal, the keys are equalif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k))))) // Assign the first element to e, using e to record e = p; //hashValues are not equal, that is, keys are not equal; Is a red-black tree nodeelse ifE = ((TreeNode<K,V>)p).puttreeval (this, TAB,hash, key, value); // Is a linked list nodeelse{// Insert node at the end of the listfor(int binCount = 0; ; ++binCount) {// reaches the end of the listif((e = p.ext) == null) {// insert a newNode at the end of the node.hash, key, value, null); // The number of nodes reaches the threshold and is converted into a red-black treeif (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash); // Break the loopbreak; } // Check whether the key value of the node in the list is equal to the key value of the inserted elementif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k)))) // equal, out of the loopbreak; P = e; p = e; p = e; }} // Find the key in the bucket.hashNode whose value is equal to the inserted elementif(e ! = null) {// record e value V oldValue = e.value; / / onlyIfAbsent forfalseOr the old value is nullif(! OnlyIfAbsent | | oldValue = = null) / / replace the old with the new value value e.v alue = value; AfterNodeAccess (e); // Return the old valuereturnoldValue; } // Struct change ++modCount; // If the actual size is greater than the threshold, expand the capacityif(++size > threshold) resize(); // afterNodeInsertion insertion (evict);return null;
}
Copy the code

HashMap does not provide a putVal interface for users to call. Instead, HashMap provides a put function that inserts elements through putVal.

2. GetNode function

final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; // The table has been initialized with a length greater than 0, according tohashFinding items in the table is not emptyif((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash])! = null) {// The first item in the bucket (array element) is equalif (first.hash == hash&& // always check first node ((k = first.key) == key || (key ! = null && key.equals(k))))returnfirst; There is more than one node in the bucketif((e = first.next) ! = null) {// is a red-black tree nodeif(First instanceof TreeNode) // Look in the red-black treereturn ((TreeNode<K,V>)first).getTreeNode(hash, key); // Otherwise, look in the linked listdo {
                if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
Copy the code

HashMap does not provide a getNode interface directly to the user. Instead, HashMap provides a get function that uses getNode to get elements.

3. The resize function

final Node<K,V>[] resize() {// current table save Node<K,V>[] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; // Save the current threshold int oldThr = threshold; int newCap, newThr = 0; // The table size was larger than 0if(oldCap > 0) {// Before the table was larger than the maximum capacityif(oldCap >= MAXIMUM_CAPACITY) {// The threshold is the maximum Integer threshold = integer.max_value;returnoldTab; } // Double the capacity, use the left shift, more efficientelse if((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) // double threshold} // The previous threshold is greater than 0else if(oldThr > 0) newCap = oldThr; // oldCap = 0 and oldThr = 0, using the default (such as using the HashMap() constructor, and then inserting another element after the call to resize)else{ newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // The new threshold is 0if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes"."unchecked"}) // Initialize table Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; // The previous table was initializedif(oldTab ! = null) {// Copy the element againhash
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! = null) { oldTab[j] = null;if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else{ // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; // Divide the elements in the same bucket into two different linked lists according to whether (e.hash & oldCap) is 0 or notrehash
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null);if(loTail ! = null) { loTail.next = null; newTab[j] = loHead; }if(hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } }return newTab;
}
Copy the code

Note: Capacity expansion involves a rehash assignment and traverses all elements in the hash table, which is time-consuming. Avoid resize when writing programs.

The layout of elements before and after resize is as follows:


4. Thoughts on HashMap

4.1 Thinking about capacity expansion We can know from putVal source code that when an element is inserted, size is increased by one. If size is greater than threshold, capacity expansion will be carried out. Assuming that our capacity is 32 and loadFator is 0.75, then threshold is 24=32*0.75. At this point, 25 elements are inserted and all the 25 elements are inserted into the same bucket. The data structure of the bucket is red-black tree. At this point, there are still 31 empty buckets, it seems that there is no need to expand the capacity, because our capacity may not be appropriate.

As we know, the capacity expansion process traverses all elements, which is very time complex; It is also known that after an expansion process, elements will be more evenly distributed in each bucket, which will improve access efficiency. Therefore, avoiding capacity expansion means that the disadvantage of traversing the elements is greater than the benefit of evenly distributing the elements in the bucket.


Five, the summary

Now that you understand the core functions and data structures, it is not difficult to understand the source code of HashMap. Of course, there are some things that are not covered in this analysis, such as red-black trees, serialization, and copying, which will be covered in future posts.