preface

This article concludes with JDK 1.8 HashMap.

Suggest not read the source code can be seriously, read the source code can be directly jumped summary.

attribute

  • Initialize properties
public class HashMap<K.V> extends AbstractMap<K.V>
    implements Map<K.V>, Cloneable.Serializable {
    // Serial number used for serialization.
    private static final long serialVersionUID = 362498820763181265L;
    /** Default capacity, 1 shifts 4 to the left, 00000001 becomes 00010000, that is, 2 to the fourth power is 16, use shift because it is computer based operation, efficiency is faster than addition, subtraction, multiplication and division. * * /
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
    // Maximum capacity, 2 ^ 30.
    static final int MAXIMUM_CAPACITY = 1 << 30;
    // Load factor for capacity expansion.
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;
    // When the number of bucket nodes reaches 8, the tree is converted to a red-black tree.
    static final int TREEIFY_THRESHOLD = 8;
    // When a bucket has less than 6 nodes, it is converted to a linked list, provided that it is currently a red-black tree.
    static final int UNTREEIFY_THRESHOLD = 6;
    // When the number of elements in the entire hashMap reaches 64, it will also be converted to a red-black tree structure.
    static final int MIN_TREEIFY_CAPACITY = 64;
    // An array of stored elements. The transient keyword indicates that the attribute cannot be serialized
    transient Node<K,V>[] table;
    // Convert data to another storage form of set. This variable is mainly used for iteration functions.
    transient Set<Map.Entry<K,V>> entrySet;
    // Number of elements
    transient int size;
    // Count the number of map changes
    transient int modCount;
    // Threshold, that is, when the number of elements reaches a threshold, expansion is performed.
    int threshold;
    // Also load factor, but this is a variable.
    final float loadFactor;  
}
Copy the code
  • Initialize the inner class

Two nodes, designed for linked lists and red-black trees.

static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
        TreeNode<K,V> parent;  
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next); }}static class Node<K.V> implements Map.Entry<K.V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
 
        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next; }}Copy the code

A constructor

	// 1. Use the default load factor
	public HashMap(a) {
        this.loadFactor = DEFAULT_LOAD_FACTOR; 
    }
 	
 	// 2. Set capacity
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
 
    // 3. Set the initial capacity and load factor
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    // 4. Pass in a map and putMapEntries to convert the map to hashMap
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }
 
 
    final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
        // Get the actual length of the map
        int s = m.size();
        if (s > 0) {
            // Check whether the table is initialized, if not
            if (table == null) { // pre-size
                /** +1 = 0; /** +1 = 0; /** +1 = 0; /** = 0; 22/0.75=29.3, the required capacity must be 30, some people ask if it is just enough to divide the whole number, if it is an integer, it does not matter if the capacity is 1 more **/
                float ft = ((float)s / loadFactor) + 1.0 F;
                // Determine whether the capacity exceeds the upper limit.
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                         (int)ft : MAXIMUM_CAPACITY);
                TableSizeFor (t) The method returns a value greater than t and the nearest power of 2. For example, if t is 29, the value is 32**/
                if (t > threshold)
                    threshold = tableSizeFor(t);
            }
            // If the table is already initialized, expand the table. Resize () is used to expand the table.
            else if (s > threshold)
                resize();
            // Go over the map to the hashMap.
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict); }}}Copy the code

put

  • hash
static final int hash(Object key) {
    int h;
    /** get the hashCode of the key first, then shift it and perform xor
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
  • resize
final Node<K,V>[] resize() {
    // Insert oldTal into the hash array
    Node<K,V>[] oldTab = table;
    / / the length of the old
    int oldCap = (oldTab == null)?0 : oldTab.length;
    // The threshold of old
    int oldThr = threshold;
    // Initializes the length and threshold of new
    int newCap, newThr = 0;
    //oldCap > 0 means it is not initialized for the first time because hashMap is lazily loaded
    if (oldCap > 0) {
        // Greater than the maximum value
        if (oldCap >= MAXIMUM_CAPACITY) {
            // The critical value is the maximum value of an integer
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // mark ##, in other cases, double the size, and the size should be smaller than the maximum size, and the old length should be greater than 16
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            // The threshold is also increased to double the old threshold
            newThr = oldThr << 1; 
    }
    /** If oldCap<0 but has already been initialized, such as after the element is deleted, then its critical value must still exist. If oldCap is initialized for the first time, its critical value will be 0 **/
    else if (oldThr > 0) 
        newCap = oldThr;
    // For the first initialization, give the default value
    else {               
        newCap = DEFAULT_INITIAL_CAPACITY;
        // Critical value equals capacity * load factor
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // If is a complement to the ## mark above, that is, if the capacity is less than the default value of 16 when initialized, newThr is not assigned
    if (newThr == 0) {
        // Critical value of new
        float ft = (float)newCap * loadFactor;
        // Check whether the new capacity is greater than the maximum value and whether the critical value is greater than the maximum value
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    // The critical value of the above case is actually changed here, i.e., the capacity and the critical value are changed.
    threshold = newThr;
    // Indicates that the warning is ignored
    @SuppressWarnings({"rawtypes","unchecked"})
        / / initialization
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    // Assign the current table
    table = newTab;
    // Insert elements from old into new
    if(oldTab ! =null) {
        for (int j = 0; j < oldCap; ++j) {
            // Temporary variables
            Node<K,V> e;
            // The current hash bucket position value is not null, that is, there is a value at the array subscript, because there is a value can be a conflict
            if((e = oldTab[j]) ! =null) {
                // Set the assigned variable to null, of course, in order to recycle, free memory
                oldTab[j] = null;
                // If the node at the subscript has no next element
                if (e.next == null)
                    // Store the value of this variable in newCap. E. hash & (newcap-1) does not equal j
                    newTab[e.hash & (newCap - 1)] = e;
                // This node is a red-black tree structure, that is, there is a hash conflict, the hash bucket has multiple elements
                else if (e instanceof TreeNode)
                    // Move the tree to newCap
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { NewCap = newCap; newCap = newCap; newCap = newCap
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    // Returns the expanded hashMap
    return newTab;
}
Copy the code
  • put
public V put(K key, V value) {
    If the key is null, insert a new value. If the key is null, insert a new value. If the key is null, insert a new value
    return putVal(hash(key), key, value, false.true);
}
 
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    // TAB hash array, p the first node of the hash bucket, n the length of the hashMap, I the array index calculated
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // The table is not loaded at the beginning, and then the table is loaded after put
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    /** if there is no value for the computed position of the hash bucket, the newly inserted key-value will be put here. Even if the insert fails, the first node of the hash bucket will be assigned to p**/ if a hash conflict occurs
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // Hash conflicts occur in several cases
    else {
        // the function of the temporary node, k stores the key of the current node
        Node<K,V> e; K k;
        // The hash value of the inserted key-value is the same as the hash value of the current node. E = p indicates the first node
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        // Second, the hash value is not equal to the first node, to determine whether the p is a node of the red-black tree
        else if (p instanceof TreeNode)
            /** is a node in a red-black tree. If the node already exists, this node is returned (not null). This value is important to determine whether the put operation succeeded
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // The hash value is not equal to the first node. If it is not a node of the red-black tree, it is a node of the linked list
        else {
            // Iterate through the list
            for (int binCount = 0; ; ++binCount) {
                // If the key-value is not duplicated, add the key-value at the end
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // Determine whether to convert to a red-black tree structure
                    if (binCount >= TREEIFY_THRESHOLD - 1) 
                        treeifyBin(tab, hash);
                    break;
                }
                // If there are duplicate keys in the list, e is the current duplicate node, ending the loop
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}// If there is a duplicate key, the old value is overwritten by the value to be inserted.
        if(e ! =null) { 
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            returnoldValue; }}// This step indicates that the key-value to be inserted does not duplicate the key, because the value of node E is null
    // Number of changes +1
    ++modCount;
    // The actual length is +1. Check whether the actual length is greater than the critical value
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    // Added successfully
    return null;
}
Copy the code

remove

  • There is also a clear method that sets all array subscript elements to NULL.
public V remove(Object key) {
    // Temporary variables
    Node<K,V> e;
    /** Call removeNode(hash(key), key, null, false, true) with the third value null to remove the key
    return (e = removeNode(hash(key), key, null.false.true)) = =null ?
        null : e.value;
}

/** the first parameter is the hash value, the second parameter is the key, the third parameter is the value, the fourth parameter is true, the corresponding value of the key is not deleted, the fourth parameter is false, the node is not moved **/
final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    // TAB hash array, p array index node, n length, index current array index
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    // The hash array is not null, and the length is greater than 0, and the index position of the node whose key is to be removed is obtained
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) ! =null) {
        //nodee stores the node to be deleted, e temporary variable, k current node key, v current node value
        Node<K,V> node = null, e; K k; V v;
        // If the array subscript happens to be the node to be deleted, assign the value to the temporary variable node
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            node = p;
        // The node to be deleted is a red-black tree or a linked list
        else if((e = p.next) ! =null) {
            if (p instanceof TreeNode)
                // Walk through the red-black tree, find the node and return
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else { // represents a linked list node, the same traversal to find the node
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    /** note that if you are traversing the list, p is no longer the node of the array subscript, but the last node of the node **/
                    p = e;
                } while((e = e.next) ! =null); }}// After finding the node to delete, judge! MatchValue, our normal remove remove,! MatchValue to true
        if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
            // If the node to be deleted is a red-black tree, delete it from the red-black tree
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            // If the list structure is linked and the node removed is the array subscript node (i.e. the header node), let the next node be the header
            else if (node == p)
                tab[index] = node.next;
            else /** the next node to be deleted is set to the next node of the previous node **/
                p.next = node.next;
            // Modify the counter
            ++modCount;
            // The length is reduced by 1
            --size;
            /** This method is implemented by subclasses of hashMap to handle the list relationship after deleting nodes **/
            afterNodeRemoval(node);
            // Returns the deleted node
            returnnode; }}// If null is returned, the node does not exist and cannot be deleted
    return null;
}	
Copy the code

get

public V get(Object key) {
    Node<K,V> e;
    // This is done by calling the getNode method
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
 
final Node<K,V> getNode(int hash, Object key) {
    //first header, e temporary variable, n length,k key
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // The header is the node of the array index
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        // If it is a header, return the header directly
        if(first.hash == hash && ((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        // Not a header
        if((e = first.next) ! =null) {
            // Check if it is a red-black tree
            if (first instanceof TreeNode)
                // Go to the red black tree and return
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do { // List node, as in the list, find the node and return
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}// If the node cannot be found, it does not exist
    return null;
}
Copy the code

treeify

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    /** * It is important to note that a Bucket does not have eight elements in it to be converted into a red-black tree. The hash table must be larger than 64. * /
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code

conclusion

  • About the implementation

JDK 1.8 uses arrays + linked lists + red-black trees to implement hashMaps.

  • About addressing

When a HashMap put or get is called, hashCode() is first called to obtain the hash value, and then the bitwise sum with (capacity -1) is e.hash & (Length-1) to obtain the index of bucket.

  • About the get

After addressing, the object value is returned.

  • About the put

There is a hash conflict and a capacity expansion problem.

  1. Hash conflict

HashMap uses a zipper approach to resolving hash conflicts, unlike JDK1.7, which uses tail interpolation to avoid the problem of inversions and linked list loops.

  1. Resize ()
  • About Capacity Expansion Mechanism

The bottom line is to double the bucket size and rehash all the nodes.

E.hash & (newcap-1) -> e.hash & (oldCap * 2-1)

The conclusion here is that the old and new indices will either be the same, or the old indices + the length of the array, depending on the extra bit of the oldCap binary shifted to the right.

  • On the transformation threshold of linked list and red black tree

It is important to note that a Bucket does not have eight elements in it to be converted into a red-black tree; it must have hash sizes greater than 64.

The frequency with which nodes in containers are distributed in hash buckets follows a Poisson distribution

The source code states that the ideal situation would be to use random hash codes,

However, the comparison table between the number of elements in the bucket and the probability was calculated according to the Calculation formula of Poisson distribution. It can be seen that the probability of the number of elements in the linked list is very small when the number of elements is 8, even less. Therefore, the original author chose 8 when selecting the number of elements in the linked list, which was based on the probability statistics.

  • About overload factor

0.75 is a compromise between improving space utilization and reducing query cost. The main reason is poisson distribution, and 0.75 has the least collision.

Ilya:

  1. A load factor of 1 means a larger threshold for resize (), which means an increase in space utilization but also the cost of the query.
  2. If the loading factor is 1, it indicates that the resize () threshold is small, space utilization is low, and too many rehash operations are performed.
  • Of capacity to the power of two

I should mention the concept of a hash, a hash.

Let’s take a bunch of numbers and map them to N buckets, so let’s say I put 1 to 100,000 into N buckets.

Normal thinking is to do the mod operation, but the efficiency is too low.

Perspective-taking, using basic computer operations, bitwise operations, to convert numbers into binary numbers.

So the hash of a HashMap is: e.hash & (length-1)

Set the capacity to a power of two to make the elements more evenly distributed.

There are also binary operations in resize.