preface

HashMap is a common interview question. In this article, we will read the source code to understand its design principles and the following questions

  • Implementation principle of HashMap
  • Why is the initial capacity a multiple of 2
  • How toresize
  • Thread safe or not

Commonly used parameters

    // Maximum capacity is 2 to the 30th
    static final int MAXIMUM_CAPACITY = 1 << 30;
    // This parameter is triggered only when the initial capacity is 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
    // Default load factor
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;

    // hash table, store linked list. The length is 2 to the N, or 0 when initialized.
    transient Node<K,V>[] table;
    // Load factor, used to calculate the threshold of the number of hash table elements. Threshold = hash table. Length * loadFactor;
    final float loadFactor;
    // Threshold for the number of elements in the hash table. When the number of elements in the hash table exceeds the threshold, resize() occurs.
    int threshold;
Copy the code

Table is called a hash table and is used to store linked list nodes

The Node list

    static class Node<K.V> implements Map.Entry<K.V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey(a)        { return key; }
        public final V getValue(a)      { return value; }
        public final String toString(a) { return key + "=" + value; }

        public final int hashCode(a) {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false; }}Copy the code

It can be seen that the data structure of Node is Node nested Node, so it is called linked list. A Node in the entire list is called a Node.

A constructor

    public HashMap(a) {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
Copy the code
    public HashMap(int initialCapacity) {
        // Specify a constructor that initializes the capacity
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
Copy the code
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        // Set the threshold to initialize the capacity to the 2 n power
        this.threshold = tableSizeFor(initialCapacity);
    }
Copy the code
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        // Add the elements in m to the new hash table
        putMapEntries(m, false);
    }
Copy the code

capacityresize

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null)?0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            // The capacity cannot be expanded
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                  // If the old capacity is greater than or equal to the default initial capacity 16
                  // The new capacity will be twice the old capacity
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            // If the old hash table is 0, the new hash table is the default
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            // The new threshold is 0 recalculated
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes"."unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if(oldTab ! =null) {
            // Iterate over the old hash table
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        // Place just one element in the new hash table at e.hash & (newcap-1)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                       // Since capacity expansion doubles capacity, each linked list in the original hash table may now be stored in the original index, i.e., low bit,
                      // Or the index after expansion, that is, high bit. High bit = Low bit + Original hash bucket capacity
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

Simply put, if the hash table is empty, it is allocated to the original capacity, otherwise it is doubled, and the elements of the original hash table may be in the same position as the new hash table, or they may be in index+oldCap position

Increase and changeput(K key, V value)

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }
Copy the code
  • 1. Calculate the hash value
    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

Hash algorithms, which are not the focus of this article, will be skipped here

  • 2. Insert or overwrite values iftableThe first(n - 1) & hashIs empty, insert a new linked list at that position; If the value is not null, the value is overwritten directly if the key value is equal.If it is a red-black tree, insert; Otherwise, append a new node to the end of the list.
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // If the current table is null, the table is initialized
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // If the I ((n-1) &hash) is null, insert the new node directly
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            // If the hash values are equal and the key is equal, the node to be overridden is returned
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                / / the red-black tree
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        // traverse to the tail, append a new node to the tail
                        p.next = newNode(hash, key, value, null);
                        // If the number of linked lists is greater than or equal to 8 after the addition of nodes, it is converted to a red-black tree
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // If the node to overwrite is found
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}// There are nodes that need to be overwritten
            if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                / / empty
                afterNodeAccess(e);
                // Return the old value
                return oldValue;
            }
        }
        
        ++modCount;
        // Update the size and determine whether expansion is required. Size is the size of the map
        if (++size > threshold)
            resize();
        / / empty
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

The flow chart is as follows

checkget(Object key)

   public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
Copy the code
  • 1. Calculate the hash value

  • 2. Check the (n-1) and hash positions of the table. For example, if keys are equal, return this object. If it is a red-black tree, the object is returned. Otherwise, loop back through the node until an equal key or empty node is found.

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                // If the key is equal, this object is returned
                return first;
            if((e = first.next) ! =null) {
              / / the red-black tree
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
              // Loop through this node until an equal key or empty node is found
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null); }}return null;
    }
Copy the code

deleteremove(Object key)

  • 1. Calculate the hash valuefinal Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable)
  • 2. The judgetableThe first(n - 1) & hashThe case of the location node

If matchValue is true, then the movable parameter must have both key and value equal. When deleting nodes, do not move any other nodes

    final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) ! =null) {
            Node<K,V> node = null, e; K k; V v;
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                // If the key is equal, this object is returned
                node = p;
            else if((e = p.next) ! =null) {
                if (p instanceof TreeNode)
                   / / the red-black tree
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else {
                   // Loop through this node until an equal key or empty node is found
                    do {
                        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while((e = e.next) ! =null); }}// Nodes to be deleted and matchValue is false or equal
            if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                else if (node == p)
                // The list header is the list to be deleted
                    tab[index] = node.next;
                else
              // The node to be deleted is in the middle of the table
                    p.next = node.next;
                ++modCount;
                / / correct size
                --size;
                afterNodeRemoval(node);
                returnnode; }}return null;
    }
Copy the code

If the key is equal, this object is returned. If it is a red-black tree, the linked list is returned; Otherwise, loop back through the node until the node is found, then delete it.

conclusion

Inside a HashMap is an array of nodes, namely, Node

[] table, which is called a hash table. The nodes stored in the table are linked lists, and each Node in the linked list is an element we put into the HashMap.
,v>

  • Thread safe or not

The thread is not safe and the access process does not have any locks.

  • Why expansion

Because its data structure is an array, it involves capacity expansion.

  • How to increase

When the capacity of HashMap exceeds threshold(Length * capacity expansion factor), capacity expansion is triggered. If the linked list is empty, it will be allocated to the initial capacity; otherwise, the capacity will be doubled. The nodes of the original linked list may be in the same position of the new linked list, or in the index+oldCap position. Before and after the expansion, the length of the hash table must be a multiple of 2.

  • Why hash tables (table) The capacity is a multiple of 2

When HashMap is accessed, index (length-1) & hash is computed, using the & operator (which is more efficient than %). If length is a multiple of 2, the index is evenly divided to the maximum extent possible

  • Why do you need it? Do you need it?&Bit operations (mod%)

HashCode () is an int with a range of over 4 billion values. As long as the hash function mapping is fairly uniform and loose, collisions are minimal. However, since the length of a HashMap hash table is much smaller than the hash value range (the default value is 16), when the hash value is mod to the length of the table to find the subscript that stores the key, the high hash value is ignored because the mod is done by the ampersand operation (&). Therefore, only the lower part of hashCode() will participate in the operation, and the probability of different hash values will occur, but the same index will be obtained is greatly increased, which is called hash collision.

  • How do I resolve hash collisions

The hash function (Object key) is designed to resolve hash collisions. It combines the features of the high and low hash values and puts them in the low one. Therefore, when it is used with the high and low bits, it is equivalent to participating in the operation together to reduce the probability of hash collisions.

    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code
  • HashMapTo optimize the

1. Use an appropriate initial capacity to reduce the loss caused by resize.

2. Use appropriate keys, such as String and Integer, to reduce hash collisions and maximize storage and retrieval efficiency.

Give it a thumbs up if you think it helps. Thank you

HashMap (JDK8)