HashMap

1. An overview of the

HashMap first appeared in JDK 1.2, and the underlying implementation is based on the hash algorithm. A HashMap allows null keys and null values, with a null key hash value of 0 when calculating the hash value of a hash key. A HashMap does not guarantee the order of key-value pairs, which means that the order of key-value pairs can change after certain operations. Also, it is important to note that HashMap is a non-thread-safe class, which can be problematic in a multithreaded environment.

Principle 2.

Hashing algorithm is divided into hashing redetection and chaining. HashMap, on the other hand, uses a pull-down hash algorithm and introduced red-black tree optimization for long linked lists in JDK 1.8. The data structure is shown as follows:

In the case of the chaining hash algorithm, the data structure is composed of arrays and linked lists (or trees). When performing operations such as adding or deleting, locate the bucket where the element resides first, and then locate the element from the linked list. The HashMap base operation is a wrapper around the base operation of the chain-drawn hash algorithm. The difference is that the red-black tree was introduced in JDK 1.8, and the underlying data structure was changed from array + linked list + red-black tree.

3. Source code analysis

3.1 Construction method

3.1.1 Four construction methods

/** constructor 1 */
public HashMap(a) {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

/** constructor 2 */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/** constructor 3 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

/** /
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}
Copy the code

3.1.2 Initial Capacity, Load factor, and Threshold

In the HashMap constructor, we can adjust two parameters, initialCapacity and loadFactor. By setting these two parameters, you can further influence the threshold size. However, the initial threshold is calculated by initialCapacity only after the shift operation (tablesizefor()).

The name of the use
loadFactor The load factor
initialCapacity Initial capacity
threshold Threshold: indicates the maximum number of key/value pairs that a HashMap can hold. If the number exceeds this threshold, capacity expansion is required
Related codes:
/** The default initial capacity - MUST be a power of two. */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;// aka 16

/** The load factor used when none specified in constructor. */
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

final float loadFactor;

/** The next size value at which to resize (capacity * load factor). */
int threshold;
Copy the code

By default, the initial capacity of a HashMap is 16 and the load factor is 0.75. There is no default threshold because the threshold can be calculated by multiplying the capacity times the load factor (as explained in the comment), i.e., threshold = capacity * load factor. However, if constructor 3 is used, The threshold is calculated by a method. Next, look at the method to initialize threshold.

/** * Returns a power of two size for the given target capacity. */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

This method is used to find the lowest power >= cap. Here is a graphical example, cap = 2 to the 29th +1, returning 2 to the 30th.

Let’s look at the load factor. For a HashMap, the load factor is an important parameter that reflects the usage of the HashMap bucket array (assuming that the key-value pairs are evenly distributed). By adjusting the load factor, a HashMap can behave differently in both time and space complexity. When we turn down the load factor, the HashMap can hold fewer logarithms of keys. When you re-store key-value pairs in a new bucket array, collisions between keys decrease and the list becomes shorter. At this point, the efficiency of HashMap operations such as add, delete, change and check will become higher. Here is a typical space for time. Conversely, if you increase the load factor (which can be greater than 1), the number of key-value logarithm that a HashMap can hold increases, resulting in high space utilization, but also a high collision rate. This means that as lists get longer, they become less efficient, in the case of time for space. As for how to adjust the load factor, this depends on the use scenario. In general, we use the default values

3.2 find

The process of finding a HashMap is to locate the bucket where the key-value pair resides and then search the linked list or red-black tree. The search of the linked list is relatively simple, that is, the list is traversed, and the search of the red-black tree will be analyzed later.

Let’s look at the algorithms that compute the hash. When the size of the array is small, only the lower part of the hash is computed, and the higher part of the hash is considered invalid. As a result, the calculation result is only related to the low level information, and the high level data does not play a role. To deal with this flaw, we can xor the high 16 bits of hashCode with the low 16 bits, that is, hashCode ^ (hashCode >>> 16). In this way, the high level data and the low level data are xor, so as to increase the randomness of the low level information and make the high level data participate in the calculation in a disguised way. Another benefit of recalculating the hash is that it can increase the complexity of the hash. When we overwrite a hashCode method, we might write a hashCode method that is poorly distributed, resulting in a high hash collision rate. By shifting and xOR operations, you can make the hash more complex, affecting the distribution of the hash. This is why a HashMap does not directly use the original hash of the key object.

static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

Next, the first step of the search is to locate the bucket:

// The size of the bucket array in the HashMap is always a power of 2, in which case (n-1) &hash is equivalent to mod length. However, the calculation efficiency of mod is not as high as that of bit operation, so (n-1) & hash is used for optimization.
first = tab[(n - 1) & hash]
Copy the code

The specific code is as follows:

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    /**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab;
        Node<K,V> first, e;
        int n; K k;
        // Locate the bucket where the key-value pair resides
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
            if((e = first.next) ! =null) {
                //2. If first is of type TreeNode, the red-black tree lookup method is called
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                //3
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null); }}return null;
    }
Copy the code

3.3 traversal

To iterate over all keys, we first get the KeySet object and then iterate through the iterator of the KeySet. The KeyIterator class inherits from the HashIterator class, and the core logic is encapsulated in the HashIterator class. The logic of a HashIterator is not complicated. At initialization, the HashIterator first finds the bucket containing the list node reference from the bucket array. It then iterates through the list to which the bucket points. Once traversal is complete, continue looking for the next bucket that contains the list node reference and continue traversal. If not found, end traversal. For example, suppose we walk through the structure of the following graph:

When the HashIterator is initialized, it first iterates through the bucket array to find the bucket that contains the list node reference, which is bucket number 3. The nextNode method then iterates through the list to which the bucket points. After iterating through bucket 3, the nextNode method continues to find the next bucket that is not empty, which corresponds to bucket 7 in the figure. The process is similar until the last bucket is iterated. Corresponding to the following figure:

The core code is as follows:

public Set<K> keySet(a) {
    Set<K> ks = keySet;
    if (ks == null) {
        ks = new KeySet();
        keySet = ks;
    }
    return ks;
}

/** * key set */
final class KeySet extends AbstractSet<K> {
    public final int size(a)                 { return size; }
    public final void clear(a)               { HashMap.this.clear(); }
    public final Iterator<K> iterator(a)     { return new KeyIterator(); }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null.false.true) != null;
    }
    // Omit some code
}

/** * key iterator */
final class KeyIterator extends HashIterator 
    implements Iterator<K> {
    public final K next(a) { returnnextNode().key; }}abstract class HashIterator {
    Node<K,V> next;        // next entry to return
    Node<K,V> current;     // current entry
    int expectedModCount;  // for fast-fail
    int index;             // current slot

    HashIterator() {
        expectedModCount = modCount;
        Node<K,V>[] t = table;
        current = next = null;
        index = 0;
        if(t ! =null && size > 0) { // advance to first entry 
            // Find the first bucket that contains a list node reference
            do {} while (index < t.length && (next = t[index++]) == null); }}public final boolean hasNext(a) {
        returnnext ! =null;
    }

    final Node<K,V> nextNode(a) {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if(modCount ! = expectedModCount)throw new ConcurrentModificationException();
        if (e == null)
            throw new NoSuchElementException();
        if ((next = (current = e).next) == null&& (t = table) ! =null) {
            // Find the next bucket containing the list node reference
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }
    // Omit some code
}
Copy the code

Finally, in JDK 1.8, red-black tree optimization was introduced to avoid the impact of long linked lists on HashMap performance. But in the above source code is not found in the red black tree traversal related logic, this is why? Because when a list is converted to a red-black tree, the order of the original list is still maintained.

3.4 insert

Insertion process

The core code is as follows:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // Initialize the bucket array table, which is deferred until new data is inserted
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // If the bucket does not contain a key-value node reference, the new key-value node reference is stored in the bucket
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // If the key value and node hash are equal to the first key-value pair node in the list, then e points to that key-value pair
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
            
        // If the reference type in the bucket is TreeNode, the red-black tree insertion method is called
        else if (p instanceof TreeNode)  
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // Count the length of the linked list
            for (int binCount = 0; ; ++binCount) {
                // If the list does not contain a key-value pair node to insert, the node is joined at the end of the list
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // If the list length is greater than or equal to the tree threshold, the tree operation is performed
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                
                // If the condition is true, the current list contains key-value pairs to be inserted, terminating traversal
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}// Determine whether the key-value pair to be inserted exists in the HashMap
        if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            // onlyIfAbsent Indicates whether the key-value pair value is updated only when oldValue is null
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // If the number of key pairs exceeds the threshold, the system expands the capacity
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

The insertion process can be summarized as:

  1. “Call the hash function to get the hash value of the Key, and then calculate its array index;
  2. If there are no hash conflicts, it goes directly to the array; If a hash conflict occurs, it is placed after the list as a linked list.
  3. If the list length exceeds the TREEIFY THRESHOLD (TREEIFY THRESHOLD==8), the list is converted to a red-black tree; if the list length is below 6, the red-black tree is converted back to the list.
  4. If the key of a node already exists, replace its value.
  5. If the number of key-value pairs in the collection is greater than the threshold, the resize method is called to expand the array.

3.4.2 Capacity Expansion Mechanism

In a HashMap, the bucket array is a power of 2, and the threshold size is the product of the bucket array length and the load factor. If the number of key/value pairs in a HashMap exceeds the threshold, expand the number. The HashMap is expanded by twice the length of the current bucket array, and the threshold is also doubled. (If the threshold overflow returns to zero during calculation, the calculation is recalculated according to the threshold formula.) After expansion, the key/value pairs are recalculated and moved to their proper positions. Let’s look at the implementation:

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    // If the table is not empty, it has been initialized
    if (oldCap > 0) {
        // If the table capacity exceeds the maximum capacity, the capacity will not be expanded
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        } 
        // Calculate the size of the new capacity and threshold twice as the old capacity and threshold
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            // Overflow may result
            newThr = oldThr << 1; // double threshold
    } else if (oldThr > 0) // initial capacity was placed in threshold
        /* * On initialization, the value of threshold is assigned to newCap, * HashMap uses the threshold variable to temporarily hold the value of the initialCapacity parameter */ 
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        /* * When the no-argument constructor is called, the bucket array capacity is the default capacity, and the threshold value is the default capacity multiplied by the default load factor */
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    
    // If newThr is 0, the value is calculated according to the threshold calculation formula
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    // Create a new bucket array. This is where the bucket array is initialized
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        // If the old bucket array is not empty, the bucket array is iterated over and the key-value pairs are mapped to the new bucket array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // When remapping, the red-black tree needs to be split
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // Iterate through the list and group the list nodes in the same order
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // Map the grouped list to the new bucket
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

This code does three things:

  1. Calculates the capacity of the new bucket array, newCap, and the new threshold, newThr
  2. Create a new bucket array based on the calculated newCap. This is where the bucket array table is initialized
  3. Remap the key-value pair node to the new bucket array. If the node is of TreeNode type, you need to split the bonus black tree. If it is a common node, the nodes are grouped in the same order.

In nested branches of the first branch, it is possible to overflow from a shift. Nested branches:

conditions coverage note
oldCap >= 230 The bucket array capacity is greater than or equal to the maximum bucket capacity 230 In this case, no expansion is required
newCap < 230 && oldCap > 16 The capacity of the new bucket array is smaller than the maximum value, and the capacity of the old bucket array is greater than 16 In this case, the new threshold newThr = oldThr << 1, and the shift may lead to overflow

When the loadFactor decimal is 0 and the integer bit is divisible by 2 and greater than or equal to 8, newThr overflow may result in zero in a calculation.

For this kind of possible situation, there is fault tolerance.

// If newThr is 0, the value is calculated according to the threshold calculation formula
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
Copy the code

Let’s look at remapping nodes. It is divided into two cases. For the tree node, it is necessary to remove the black tree to do the mapping. The linked list nodes need to be grouped before mapping. After grouping, the relative positions of nodes in the group remain unchanged. Here’s an example:

Assume that the bucket array in the figure above is expanded, and the capacity after expansion is n = 16, and the remapping process is as follows:

If the value is 0, point loHead and loTail to the node. If followed by a node hash & oldCap of 0, the node is chained to the list to which loHead points and loTail points to that node. If the value is non-zero, make hiHead and hiTail point to the node. When the traversal is complete, you might end up with two linked lists, at which point you are done grouping linked lists:

Finally, store the two links in corresponding buckets to complete capacity expansion

After remapping, the order of nodes in the two linked lists remains unchanged, and the order before expansion is still maintained. The HashMap expansion efficiency in JDK 1.8 is higher than that in previous versions. If you look at the JDK 1.7 source code, you will see that JDK 1.7 introduced random seeds in the hash calculation process to prevent denial-of-service attacks caused by hash collisions. To increase the randomness of the hash so that the key-value pairs are evenly distributed in the bucket array. During capacity expansion, methods determine whether new random seeds need to be generated based on the capacity and recalculate the hash of all nodes. In JDK 1.8, this approach was replaced by the introduction of red-black trees. In this way, multiple hash computations are avoided and capacity expansion efficiency is improved.

3.4.3 Tree-linked list, red-black tree chain and separation

Jdk1.8 introduces red-black trees to handle frequent hash collisions, increasing code complexity.

List the tree

Tree core code:

static final int TREEIFY_THRESHOLD = 8;

/** * When the capacity of the bucket array is smaller than the value, the bucket array is expanded preferentially instead of being tree */
static final int MIN_TREEIFY_CAPACITY = 64;

static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next); }}/** * Convert an ordinary node list to a tree node list */
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    // The capacity of the bucket array is smaller than MIN_TREEIFY_CAPACITY
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        // hd for head, tl for tail
        TreeNode<K,V> hd = null, tl = null;
        do {
            // Replace an ordinary node with a tree node
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);  // Convert an ordinary list to a tree list
        if((tab[index] = hd) ! =null)
            // Convert the tree list to a red-black treehd.treeify(tab); }}TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
    return new TreeNode<>(p.hash, p.key, p.value, next);
}
Copy the code

Treification requires two conditions:

  1. The length of the linked list must be greater than or equal to TREEIFY_THRESHOLD
  2. The capacity of the bucket array must be greater than or equal to MIN_TREEIFY_CAPACITY

The reason for adding the second condition is that when the bucket array size is small, the collision rate of key-value on node hash may be high, resulting in long list length. This is a time to prioritize scaling, not immediately tree. After all, the high collision rate is caused by the small size of the bucket array, which is the main cause. When the capacity is small, capacity expansion can be prioritized to avoid some unnecessary tree process of the column. In addition, if the bucket capacity is small, capacity expansion is frequent. During capacity expansion, the black tree needs to be removed and remapped. Therefore, in the case of small bucket capacity, there is no need to transform the linked list into a red-black tree.

Process: Let’s move on to the treeifyBin method. The main function of this method is to convert a common list into a list of TreeNode nodes, and finally call Treeify to turn the list into a red-black tree. TreeNode inherits from the Node class, so TreeNode still contains a next reference, and the order of the nodes in the original list is finally saved through the next reference. We assume that the structure of the linked list before tree is as follows:

HashMap was not designed with red-black tree optimization in mind. So there is no requirement that the key class implement the Comparable interface or provide a comparator, as TreeMap does. However, since the treification process requires the comparison of the sizes of two key objects, it becomes a tricky problem to compare the sizes of keys without implementing the Comparable interface for key classes. To solve this problem, HashMap does three steps to ensure that the two keys can be compared, as follows:

  1. Compares the size of hashes between keys, and continues if the hashes are the same
  2. Checks if the key class implements the Comparable interface, and if the implementation calls the compareTo method for comparison
  3. If you still can’t compare the size, you need to arbitrate, using tieBreakOrder

The code is as follows:

final void treeify(Node<K,V>[] tab) {
            TreeNode<K,V> root = null;
            for (TreeNode<K,V> x = this, next; x ! =null; x = next) {
                next = (TreeNode<K,V>)x.next;
                x.left = x.right = null;

                // Set the root
                if (root == null) {
                    x.parent = null;
                    x.red = false;
                    root = x;
                }
                else {
                    K k = x.key;
                    inth = x.hash; Class<? > kc =null;
                    // Walk through the inserted node to find the appropriate insertion position of the current node
                    for (TreeNode<K,V> p = root;;) {
                        int dir, ph;
                        K pk = p.key;
                        if ((ph = p.hash) > h)
                            dir = -1;
                        else if (ph < h)
                            dir = 1;
                            Comparable
      
        and, if so, compareTo()
      
                        else if ((kc == null &&
                                  (kc = comparableClassFor(k)) == null) ||
                                 (dir = compareComparables(kc, k, pk)) == 0)
                            dir = tieBreakOrder(k, pk);

                        TreeNode<K,V> xp = p;
                        if ((p = (dir <= 0)? p.left : p.right) ==null) {
                            x.parent = xp;
                            if (dir <= 0)
                                xp.left = x;
                            else
                                xp.right = x;
                                // Balance the binary tree
                            root = balanceInsertion(root, x);
                            break;
                        }
                    }
                }
            }
            moveRootToFront(tab, root);
        }
Copy the code

After the linked list tree is transformed into a binary tree, the balanced binary tree is achieved by left-rotation, right-rotation and changing the node color. Red and black tree

The core code is as follows (there are comments in the code)

static <K,V> TreeNode<K,V> balanceInsertion(TreeNode
       
         root, TreeNode
        
          x)
        ,v>
       ,v> {
            x.red = true;
            //x - new node ----------------------- N
            //xp - the parent of the new node ----------------------- P
            // XPP - the new node's grandfather ----------------------- G
            / / XPPL - new nodes grandfather left son -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- P or U
            / / XPPL - new nodes grandfather right son -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- U or U
            for (TreeNode<K,V> xp, xpp, xppl, xppr;;) {
                // Insert is the root node
                if ((xp = x.parent) == null) {
                    Property 1: The root node needs to be black
                    x.red = false;
                    return x;
                }
                // The parent is black or the parent is root
                else if(! xp.red || (xpp = xp.parent) ==null)
                    return root;

                // The parent is red, so it has a grandparent, and the grandparent is black; So the new node will also have an uncle node (maybe a nil node)

                // The parent is the left son of the grandfather
                if (xp == (xppl = xpp.left)) {
                    // The grandfather's right son (uncle) is not a leaf and is red. Redraw the father and uncle as black, and the grandfather as red, recursively on the grandfather
                    if((xppr = xpp.right) ! =null && xppr.red) {
                        xppr.red = false;
                        xp.red = false;
                        xpp.red = true;
                        x = xpp;
                    }

                    // At this point, the grandparent has no right son or the right son is black
                    else {
                        // The new node is the right son of its parent node,
                        if (x == xp.right) {
                            // Rotate left to switch the position of the new node and its parent
                            root = rotateLeft(root, x = xp);
                            xpp = (xp = x.parent) == null ? null : xp.parent;
                        }

                        // The new node is the left son of its parent, the parent will be redrawn black, the grandfather will be redrawn red, and the right-handed grandfather will be redrawn red
                        if(xp ! =null) {
                            xp.red = false;
                            if(xpp ! =null) {
                                xpp.red = true;
                                // Right-handed grandparentroot = rotateRight(root, xpp); }}}}// The parent node is the right child of the parent node, which is symmetric to the preceding branch
                else {
                    if(xppl ! =null && xppl.red) {
                        xppl.red = false;
                        xp.red = false;
                        xpp.red = true;
                        x = xpp;
                    }
                    else {
                        if (x == xp.left) {
                            root = rotateRight(root, x = xp);
                            xpp = (xp = x.parent) == null ? null : xp.parent;
                        }
                        if(xp ! =null) {
                            xp.red = false;
                            if(xpp ! =null) {
                                xpp.red = true;
                                root = rotateLeft(root, xpp);
                            }
                        }
                    }
                }
            }
        }
               
Copy the code

left-handed

 // The right child becomes the parent of the current child,
        // The current node becomes the left child of the right child,
        // The left child of the right child becomes the right of the current child
        static <K,V> TreeNode<K,V> rotateLeft(TreeNode
       
         root, TreeNode
        
          p)
        ,v>
       ,v> {

            //p - the current node
            //r - right child of the current node
            TreeNode<K,V> r, pp, rl;
            if(p ! =null&& (r = p.right) ! =null) {
                // The parent of the left child of the right child points to the current node
                if((rl = p.right = r.left) ! =null)
                    rl.parent = p;


                // The parent of the right child points to the parent of the current child

                // If it is the root, redraw it in black
                if ((pp = r.parent = p.parent) == null)
                    (root = r).red = false;

                // The parent of the current node is not the root. The right child of the current node points to the left or right child of the parent of the current node
                else if (pp.left == p)
                    pp.left = r;
                else
                    pp.right = r;
                // The current node points to the left of the right child
                r.left = p;
                // The right child becomes the parent of the current node
                p.parent = r;
            }
            return root;
        }
Copy the code

The logic of right rotation is basically the same as that of left rotation, so I won’t do the analysis. After the list is turned into a red-black tree, the order of the original list will still be referenced (the root of the red-black tree will be moved to the first place in the list), and we can still traverse the red-black tree in the same way that we traverse the list. This structure sets the stage for the following red-black tree segmentation and red-black tree transformation into linked lists.

Red black tree split

After capacity expansion, common nodes need to be remapped, and red-black tree nodes are no exception. As a general idea, we can first turn a red-black tree into a linked list and then remap the linked list. This is the easiest way to think about it, but you lose some efficiency. So, as mentioned in the previous section, when converting a regular list into a red-black tree, the HashMap preserves the node order of the original list with two additional references, Next and prev. So when you remap a red-black tree, you can do it like a linked list. In this way, the mapping of red-black tree into linked list is avoided, which improves efficiency virtually. Concrete implementation:

// Red black tree list threshold
static final int UNTREEIFY_THRESHOLD = 6;

final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
    TreeNode<K,V> b = this;
    // Relink into lo and hi lists, preserving order
    TreeNode<K,V> loHead = null, loTail = null;
    TreeNode<K,V> hiHead = null, hiTail = null;
    int lc = 0, hc = 0;
    /* * The red-black tree node still retains the next reference, so you can still traverse the red-black tree as a linked list. * The following loop groups the red-black tree nodes, similar to the above */
    for(TreeNode<K,V> e = b, next; e ! =null; e = next) {
        next = (TreeNode<K,V>)e.next;
        e.next = null;
        if ((e.hash & bit) == 0) {
            if ((e.prev = loTail) == null)
                loHead = e;
            else
                loTail.next = e;
            loTail = e;
            ++lc;
        }
        else {
            if ((e.prev = hiTail) == null)
                hiHead = e;
            elsehiTail.next = e; hiTail = e; ++hc; }}if(loHead ! =null) {
        // If the loHead is not empty and the list length is less than or equal to 6, then the red-black tree is turned into a list
        if (lc <= UNTREEIFY_THRESHOLD)
            tab[index] = loHead.untreeify(map);
        else {
            tab[index] = loHead;
            /* * hiHead == null: after capacity expansion, * all nodes remain in their original positions and the tree structure remains unchanged
            if(hiHead ! =null) loHead.treeify(tab); }}// Similar to above
    if(hiHead ! =null) {
        if (hc <= UNTREEIFY_THRESHOLD)
            tab[index + bit] = hiHead.untreeify(map);
        else {
            tab[index + bit] = hiHead;
            if(loHead ! =null) hiHead.treeify(tab); }}}Copy the code

It can be seen from the source that the logic of remapping a red-black tree is basically the same as the logic of remapping a linked list. The difference is that, after remapping, the red-black tree is split into two linked lists of Treenodes. If the list length is less than UNTREEIFY_THRESHOLD, the list is converted to a normal list. Otherwise, re-tree the TreeNode list based on the conditions. As an example, suppose that the red-black tree in the figure above is remapped after expansion, the mapping result is as follows.

Red black tree chain

Red-black tree still retains the original list node order. With that in mind, it’s much easier to convert a Red-black tree to a linked list. You just need to convert the TreeNode list to a linked list of type Node.

final Node<K,V> untreeify(HashMap<K,V> map) {
    Node<K,V> hd = null, tl = null;
    // Iterate over the TreeNode list and replace it with Node
    for (Node<K,V> q = this; q ! =null; q = q.next) {
        // Replace the node type
        Node<K,V> p = map.replacementNode(q, null);
        if (tl == null)
            hd = p;
        else
            tl.next = p;
        tl = p;
    }
    return hd;
}

Node<K,V> replacementNode(Node<K,V> p, Node<K,V> next) {
    return new Node<>(p.hash, p.key, p.value, next);
}
Copy the code

3.5 delete

Find the location of the bucket, traverse the linked list, and delete the node

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null.false.true)) = =null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        // 1. Locate the bucket
        (p = tab[index = (n - 1) & hash]) ! =null) {
        Node<K,V> node = null, e; K k; V v;
        // If the value of the key is equal to that of the first node in the list, node points to that node
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            node = p;
        else if((e = p.next) ! =null) {  
            // If the node type is TreeNode, call the red-black tree lookup logic to locate the node to be deleted
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                // 2. Traverse the linked list to find the node to be deleted
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while((e = e.next) ! =null); }}// 3. Delete nodes and fix linked lists or red-black trees
        if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            returnnode; }}return null;
}
Copy the code

3.6 Other Details

If you read the HashMap source code carefully, you will see that the bucket array table is declared transient. Transient means volatile. In Java, variables decorated with this keyword will not be serialized by the default serialization mechanism. Back in the source code, consider a question: The bucket array table is an important data structure underlying HashMap. How can someone restore it without serializing it? For a brief explanation, HashMap does not use the default serialization mechanism. Instead, it customizes the serialization by implementing two methods: readObject/writeObject. There’s a reason for that, but what’s stored in a HashMap? Needless to say, you know it’s a key-value pair. So once we serialize the key-value pair, we can reconstruct the HashMap from the key-value pair data. Some friends may want to serialize the table can not be in one step, then directly restore not on the line? It is reasonable to think so. But there are two problems with serializing talbe:

  1. In most cases, tables cannot be fully stored, so serializing unused portions wastes space
  2. The same key-value pair may have different bucket locations under different JVMS, and deserializing the table under different JVMS may cause errors. The first step in the get/put/remove methods of a HashMap is to find the bucket location of the key based on the hash. However, if the key does not overwrite the hashCode method, the hash calculation will eventually call the hashCode method of Object. However, the hashCode method in Object is native. Different JVMS may have different implementations and generate different hashes. That is to say, the same key on different platforms may produce different hashes, and then the operation on the same table will cause problems.

Refer to the link

HashMap? ConcurrentHashMap? I believe no one can stop you after reading this article! HashMap Interview Guide