preface

HashMap is a commonly used data structure in Java. It is an important presence in collection classes, including hash tables, linked lists, and red-black trees. Hash table conflict resolution method is the chain address method, that is, the hash value of the same elements in a linked list. When there are too many conflicts, the search efficiency will be reduced if the linked list is too long. Therefore, In Java8, red-black tree is introduced into HashMap for optimization. When the length of the linked list under a hash value is too long (the length is greater than 8), it will be converted to red-black tree storage to optimize the query. Operations involving red-black trees will not be covered in detail here

The source code interpretation

The constructor

// The capacity is initialized by default
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
// Maximum capacity
static final int MAXIMUM_CAPACITY = 1 << 30;
// Load factor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

final float loadFactor;
// Capacity threshold
int threshold;

public HashMap(a) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
}

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
    	throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
    	initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
    	throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

// Returns a value just greater than cap to the 2 n power
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code
  • InitialCapacity: indicates the initialCapacity
  • LoadFactor: indicates the loadFactor
  • Threshold: Capacity threshold. When the number of elements stored in the HashMap reaches this threshold, the expansion operation will be triggered

Capatcity does not define variable records in the source code because its size is the length of the table array. The size of threshold is determined by capatCity and loadFactor. Threshold = CapatCity *loadFactor

The constructor of a HashMap mainly sets the initialization capacity and load factor. If the no-argument constructor is used, the default initialization capacity of 16 and load factor of 0.75 are used. The tableSizeFor method is called when the constructor that sets the initialized capacity is used. The bitwise operation method obtains a value of 2 to the power of N, which is just greater than cap. That is, when the user specifies a value of 2 to the power of n for the initialized capacity, the threshold of the actual HashMap is assigned a value of 2 to the power of N greater than the value. During initialization, it first opens an array according to the size of threshold, and then changes the size of threshold by assigning threshold=threshold*loadFactor (as can be seen in the resize method later), which conforms to the above statement.

Basic storage unit

The hash table of a HashMap is an array table whose elements are the Node class, which inherits the Map.Entry interface and contains keys, values,hashCode, and Next (which stores the next hop in the list). Before Java8 there was the built-in Entry class, but in Java8 the Entry class becomes the Node class.

static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey(a)        { return key; }
    public final V getValue(a)      { return value; }
    public final String toString(a) { return key + "=" + value; }

    public final int hashCode(a) {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false; }}Copy the code

TreeNode node in red-black tree

static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }

    /** * Returns root of tree containing this node. */
    final TreeNode<K,V> root(a) {
        for (TreeNode<K,V> r = this, p;;) {
            if ((p = r.parent) == null)
                returnr; r = p; }}// omit the following red-black tree add/delete, adjust, etc. }static class Entry<K.V> extends HashMap.Node<K.V> {
    Entry<K,V> before, after;
    Entry(int hash, K key, V value, Node<K,V> next) {
        super(hash, key, value, next); }}Copy the code

TreeNode inherits LinkedhashMap. Entry, and linkedhashMap. Entry inherits HashMap.Node, so TheeNode is a subclass of HashMap. Common operations on HashMap elements (such as traversal) can be converted to Node operations without determining whether the table is stored in a linked list or a red-black tree.

To find the

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

/* * Here the xOR operation is performed on the high and low 16 bits, so that the generation of the low 16 bits is affected by both the high 16 bits and the low 16 bits. In this case, when the capacity of the HashMap is too small, the generated hash value is only affected by the low 16 bits, which also increases the complexity of the hash. When a HashMap contains elements with poor hashCode distribution, this hash method can be used to reduce the collision rate of the * hash table. * /
static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}

/** * To determine whether elements are equal in a HashMap, both the hash value and the equals() method must be equal * because the hash value is not only used for look-up purposes, but also for determining elements' equality */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // (n-1) & hash is the remainder of hash%(n-1), which is more efficient for bit operation
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        if((e = first.next) ! =null) {
            // If it is a red-black tree, use the red-black tree recursive method to find
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // Otherwise use a linked list
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}
Copy the code

(n-1) &hash is a bitwise operation that computes the hash%(n-1) and determines the position of the key in the hash table. Instead of using HashCode directly, the hash value is recalculated by HashCode. Because when the length of the table is small, the generation of hash value is only related to the low value, and has nothing to do with the high value. If the HashCode low value generated by the added elements is relatively similar, but the high value is different, a lot of conflicts will occur. Therefore, xOR operations are performed on the high 16 bits and the low 16 bits, so that the low hash value is affected by HashCode and the hash values are evenly distributed.

traverse

Traversal of a HashMap is often done by traversing sets generated by keySet and entrySet methods.

public Set<K> keySet(a) {
    Set<K> ks = keySet;
    if (ks == null) {
        ks = new KeySet();
        keySet = ks;
    }
    return ks;
}

final class KeySet extends AbstractSet<K> {
    public final int size(a)                 { return size; }
    public final void clear(a)               { HashMap.this.clear(); }
    public final Iterator<K> iterator(a)     { return new KeyIterator(); }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null.false.true) != null;
    }
    public final Spliterator<K> spliterator(a) {
        return new KeySpliterator<>(HashMap.this.0, -1.0.0);
    }
    public final void forEach(Consumer<? super K> action) {
        Node<K,V>[] tab;
        if (action == null)
            throw new NullPointerException();
        if (size > 0&& (tab = table) ! =null) {
            int mc = modCount;
            for (int i = 0; i < tab.length; ++i) {
                for(Node<K,V> e = tab[i]; e ! =null; e = e.next)
                    action.accept(e.key);
            }
            if(modCount ! = mc)throw newConcurrentModificationException(); }}}final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next(a) { returnnextNode().key; }}abstract class HashIterator {
    Node<K,V> next;        // next entry to return
    Node<K,V> current;     // current entry
    int expectedModCount;  // for fast-fail
    int index;             // current slot

    HashIterator() {
        expectedModCount = modCount;
        Node<K,V>[] t = table;
        current = next = null;
        index = 0;
        // next refers to the first non-empty array element
        if(t ! =null && size > 0) { // advance to first entry
            do {} while (index < t.length && (next = t[index++]) == null); }}public final boolean hasNext(a) {
        returnnext ! =null;
    }

    final Node<K,V> nextNode(a) {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if(modCount ! = expectedModCount)throw new ConcurrentModificationException();
        if (e == null)
            throw new NoSuchElementException();
        if ((next = (current = e).next) == null&& (t = table) ! =null) {
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }

    public final void remove(a) {
        Node<K,V> p = current;
        if (p == null)
            throw new IllegalStateException();
        if(modCount ! = expectedModCount)throw new ConcurrentModificationException();
        current = null;
        K key = p.key;
        removeNode(hash(key), key, null.false.false); expectedModCount = modCount; }}Copy the code

We can see that the iterator of the keySet inherits from HashIterator, where the next method of the KeyIterator directly calls the nextNode method of HashIterator, which iterates through the table to find non-null elements. Node next is then called to iterate through the list, but conflicts may be stored in a red-black tree, which is not described. This is because the TreeNode of the red-black tree inherits Node, and TreeNode’s next stores the elements of the red-black tree traversing the next hop, so just call next of the Node class to traverse the linked list and red-black tree, and see the establishment of ThreeNode’s Next below.

insert

In addition to the red-black tree operation, insert is a more complex method of HashMap, because it is designed to expand the table, linked list to red-black tree problems, etc., first look at the code.

static final int TREEIFY_THRESHOLD = 8;

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // If the table is empty, the initialization of the table is deferred until the element is inserted
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // If there is no conflict, just put the node into the array
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // If the header key is equal, assign the header to e
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)  // If it is a red-black tree, call the tree insert operation
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // Otherwise, the list is iterated
            for (int binCount = 0; ; ++binCount) {
                // Iterate to the end and insert to the last element of the list
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // When the list length is greater than 8, it becomes a red-black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // If the same key exists, assign it to e
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}// if e is not empty, the value is returned after replacing the value under the same key
        if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // If the capacity exceeds the threshold, expand the capacity
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

Can see table is initialized at the time of insert to judge whether its initialization, initialization and expansion are call resize method, the whole insertion process is first check whether the initialization, no is initialized, then according to insert the hash value of the hash of the element found in the array position, if there is no conflict, direct deposit, If it’s a linked list, it traverses the list, if it’s a red-black tree, it traverses the tree, replaces the value with the existing key, and inserts it into the end of the list or into the red-black tree if it doesn’t exist. If the capacity is greater than 8, the linked list needs to be turned into a red-black tree.

static final int MIN_TREEIFY_CAPACITY = 64;

/* * Turn a list into a red-black tree, only if the list length is greater than 8&& the table capacity is greater than 64 */
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code

In the process of converting to a red-black tree, the direction of next and prev is established, that is, after the red-black tree is established, the whole tree can be traversed between nodes by next and Prev. This is also the reason why the value of the traversal process does not need to consider the red-black tree. How to build a red-black tree is not discussed here, but will be discussed in the next analysis of red-black tree.

Next, see how the resiz method initializes and extends the table.

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    // Check whether the length is greater than 0. If the length is 0, it is initialized
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // Initializes the specified capacity
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

When the table is empty, the table needs to be initialized. If the initialization capacity is specified, the initialization length of the threshold is set to threshold*loadFactor, and then the value of threshold is set to threshold*loadFactor. Otherwise, the default table length is 16 and the loadFactor is 0.75. The threshold is 16 x 0.75. If it has already been initialized, expansion, to expand capacity for two times, after expansion to the elements in the hash table redistribution, in redistribution in Java 7 is directly traverse elements to calculate the hash value, but is optimized in the java8, will need to assign the list or the red-black tree is divided into two linked list, Put them in their respective new table locations. LoHead and loTail, hiHead and hiTail are the head and tail of two linked lists respectively, called LO list and HI list. The elements in the same linked list in the old hash table are and calculated according to the hash value and the capacity of the old table to determine whether they are 0, divided into two categories, and put into two linked lists. The last two linked lists are placed in the location of the new table.

The principle can be illustrated by an example. Suppose a HashMap with a table size of 8 and a chain with a remainder of 2 has hash elements of key 2, 10, 18, and 26. The initial position of the element in the table is determined by mod n-1 (7 in this case).


n 1 : 00000111 2 : 00000010 r e s u l t : 00000010 n 1 : 00000111 10 : 00001010 r e s u l t : 00000010 n-1:00000111\\ 2\qquad:0000 0010\\ result:00000010\\ \quad\\ n-1:00000111\\ 10\quad:0000 1010\\ result:00000010

If the capacity is extended to 16, its position in the hash table is determined by the capacity and operation (n-1 becomes 15), if the operation is reworked


n 1 : 00001111 2 : 00000010 r e s u l t : 00000010 n 1 : 00001111 10 : 00001010 r e s u l t : 00001010 n-1:00001111\\ 2\qquad:0000 0010\\ result:00000010\\ \quad\\ n-1:00001111\\ 10\quad:0000 1010\\ result:00001010

If you look closely, the last three bits are the same as the result, except for the fourth bit, depending on whether the fourth bit of the hash value is 0 or 1, so you can actually determine where to put the fourth bit of the hash value in the table. If it’s 0, it’s the old position, and if it’s 1, it’s the old position plus 8. To determine the value of the fourth digit, you only need to calculate the hash value and the old capacity value (8:00 00 1000)

Therefore, we only need to take and operation with oldCap to determine whether there are two lists divided by 0. The list of 0 maintains its original position, and the value of oldCap is added to the list of 1 as the new position

The same thing goes for splitting red-black trees

static final int UNTREEIFY_THRESHOLD = 6;s

final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
    TreeNode<K,V> b = this;
    // Relink into lo and hi lists, preserving order
    TreeNode<K,V> loHead = null, loTail = null;
    TreeNode<K,V> hiHead = null, hiTail = null;
    int lc = 0, hc = 0;
    for(TreeNode<K,V> e = b, next; e ! =null; e = next) {
        next = (TreeNode<K,V>)e.next;
        e.next = null;
        if ((e.hash & bit) == 0) {
            if ((e.prev = loTail) == null)
                loHead = e;
            else
                loTail.next = e;
            loTail = e;
            ++lc;
        }
        else {
            if ((e.prev = hiTail) == null)
                hiHead = e;
            elsehiTail.next = e; hiTail = e; ++hc; }}if(loHead ! =null) {
        // If the length of the list is less than or equal to 6, it becomes a linked list
        if (lc <= UNTREEIFY_THRESHOLD)
            tab[index] = loHead.untreeify(map);
        else {
            tab[index] = loHead;
            // When loHead is empty, it indicates that all elements are on the hiHead list, and it is already a adjusted red-black tree
            if(hiHead ! =null) // (else is already treeified)loHead.treeify(tab); }}if(hiHead ! =null) {
        if (hc <= UNTREEIFY_THRESHOLD)
            tab[index + bit] = hiHead.untreeify(map);
        else {
            tab[index + bit] = hiHead;
            if(loHead ! =null) hiHead.treeify(tab); }}}Copy the code

delete

The delete operation is relatively simple, equivalent to deleting the element based on the search.

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null.false.true)) = =null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) ! =null) {
        Node<K,V> node = null, e; K k; V v;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            node = p;
        else if((e = p.next) ! =null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while((e = e.next) ! =null); }}if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            returnnode; }}return null;
}
Copy the code

The difference between JDK8 and JDK7

  • Red-black trees are introduced in JDK8 to convert lists longer than 8 to red-black trees, whereas in JDK7 only lists are used
  • In jdk8, the element is inserted into the end of the list. In JDk7, the element is inserted into the end of the list and added to the red-black tree. In JDk7, the element is inserted into the end of the list
  • In Jdk8, a new method is used for capacity reallocation, which is divided into two linked lists or trees for capacity reallocation, whereas in JDk7, it is the result of recalculating hash residuals

Non-default serialization

HashMap does not use the default serialization mechanism. The table variable is transient and cannot be serialized. Instead, it implements readObject/writeObject methods to customize the contents of the serialized talbe.

  • In most cases, tables cannot be fully stored, so serializing unused portions wastes space

  • The same key-value pair may have different bucket locations under different JVMS, and deserializing the table under different JVMS may cause errors.

There are different Hascodes for different JVMS, and serializing production directly causes errors for subsequent additions.

reference

HashMap source code detail analysis (JDK1.8) – Personal article – SegmentFault think no

Why insert HashMap from scratch to tail