Understand HashMap in depth

What is a HashMap

Map is a Map structure that allocates storage based on hash values. A Map is a collection class used to store key-value pairs

Why do we need to understand the underlying principles of HashMap

  1. HashMap is a must-have interview question
  2. Hashmaps are also one of the most commonly used collections in your work, so it’s important to understand the underlying principles so that you can quickly locate problems when they occur
  3. There are a lot of knowledge points involved in HashMap, which can be used to comprehensively investigate the basic skills of the interviewer. It is an unpassable hurdle to get a good offer. Next, I will use the most easy-to-understand language to reveal the mystery of HashMap

What are the commonly used Map sets

HashMap, TreeMap, LinkedMap, etc., are commonly used. Here we will focus on HashMap, because HashMap is the most commonly used, and interview is also the most frequently asked

HashMap underlying data structure

Jdk8 implements a HashMap optimization that converts the list to a red-black tree when the array length and list length reach thresholds. This further optimizes the query time

HashMap uses an array + linked list + red-black tree structure

HashMap source code analysis

Where should we start to understand HashMap?

Start with the class header of the HashMap

AbstractMap AbstractMap implements some basic methods of Map
// Implement Map interface, some methods need to subclass according to their own characteristics to implement
// Implement Cloneable, Serializable, can implement clone and serialization
public class HashMap<K.V> extends AbstractMap<K.V>
    implements Map<K.V>, Cloneable.Serializable {
    // Serialize the version number
    private static final long serialVersionUID = 362498820763181265L;

    // Initialize the size by default
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    // Maximum initial capacity
    static final int MAXIMUM_CAPACITY = 1 << 30;

    // Default load factor
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;

    // Set the threshold for converting lists to red-black trees (list length reaches 8)
    static final int TREEIFY_THRESHOLD = 8;

    // Red black tree degenerates to the list threshold (number of nodes is 6)
    static final int UNTREEIFY_THRESHOLD = 6;

    // Minimizes the size of the array converted to a red-black tree
    static final int MIN_TREEIFY_CAPACITY = 64;
    
    // Store the list array
    transient Node<K,V>[] table;

    // Can be used to obtain key and value
    transient Set<Map.Entry<K,V>> entrySet;

    // The number of key-value mappings
    transient int size;

    // The number of times the hashMap has been modified. This parameter is used to determine quick failures
    transient int modCount;

    // Expand the key logarithm
    int threshold;

    // Load factor
    final float loadFactor;
}
Copy the code

In fact, observing the member attributes above, it answers several interview questions that are particularly frequently asked

  1. What are the conditions for a linked list in a HashMap to become a red-black tree?

    The length of the list is 8 and the length of the array in the HashMap is 64 before the red-black tree conversion takes place, otherwise only the array expansion takes place

  2. Member attribute table bytransientThe modifier does not participate in serialization but it can actually be serialized. Why do you do that?

    The designers of the HashMap did not want the entire array to be serialized, but instead serialized its contents as they were serialized

Next, let’s examine the constructor of the HashMap

    // There are many overloaded constructors for HashMap, but the most complete and basic is this one
    public HashMap(int initialCapacity, float loadFactor) {
      if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
      if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
      if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
      this.loadFactor = loadFactor;
      // If initialCapacity passed in is not an integer multiple of 2, convert to an integer multiple of 2, as explained later
      this.threshold = tableSizeFor(initialCapacity);
    }
Copy the code

The put() method is now unmasked!

    // The most commonly used put method, first we need to know that a hashMap uses the hash value of the key to find a certain position in the array
    public V put(K key, V value) {
      return putVal(hash(key), key, value, false.true);
    }

    // If the key is not empty, get the result by xor the key's hashcode() value 16 bits higher and 16 bits lower
    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }

    // The implementation of the put method
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
      	// The table array is not initialized yet
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
      	// Use the hash value of the previous step and the array length -1 to get the location of the array
      	// If the array position is null, build it into a Node.
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
      	// If there is a value in the array location
        else {
            Node<K,V> e; K k;
            // If the first nodes in the array have the same key value, overwrite
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            // If the node is not the same as the head node, then check whether the node is a red black tree, if it is a red black tree to insert
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // Otherwise, it is a list node
            else {
              	// Iterate over each node in the list in turn
                for (int binCount = 0; ; ++binCount) {
                    // If the current node's next null, the new data is inserted directly to the tail
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // If a node key is the same as the key to be inserted during traversal, break the node
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}// The array is overwritten
            if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                returnoldValue; }}// Number of changes +1
        ++modCount;
      	// If the current key-value log is greater than the capacity expansion threshold, the capacity expansion is performed
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code
Let’s look at the array expansion process
    // Capacity expansion method
    final Node<K,V>[] resize() {
      	/ / the old array
        Node<K,V>[] oldTab = table;
      	// The capacity of the old array
        int oldCap = (oldTab == null)?0 : oldTab.length;
      	// The expansion threshold of the old array
        int oldThr = threshold;
        // Used to store the capacity of the expanded array and its expansion threshold
        int newCap, newThr = 0;
      	// If the size of the old array is greater than 0, it is not an array initialization operation, but an array expansion triggered by the logarithm of the key value
        if (oldCap > 0) {
            // If the size of the old array is already larger than the maximum size, the expansion threshold is directly assigned to the maximum Integer value and then returned
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
          	// If the size of the old array is moved to the left by 1 bit, the size of the old array is moved to the left by 1 bit
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
              	// The expansion threshold is also moved 1 bit to the left
                newThr = oldThr << 1; // double threshold
        }
      	// If the capacity of the old array is 0, we are initializing the array. The new capacity is the value of the old expansion threshold, which is the value we passed in
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
      	// If no value is specified in the constructor, the default 16 is used
        else {               // zero initial threshold signifies using defaults
            // The new array size is 16 by default
            newCap = DEFAULT_INITIAL_CAPACITY;
            // The capacity expansion threshold is load factor * default capacity
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
      	// If the array size is customized, but the expansion threshold is not specified, the new expansion threshold is constructed according to the above
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
      	// Create an array based on the size of the new array
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
      	// If the array is not null
        if(oldTab ! =null) {
            // Loop through each position of the old array
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
              	// The head node is not null
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    // If the next node of the head node is null
                    if (e.next == null)
                      	// Rehash the node to select the position and assign the value
                        newTab[e.hash & (newCap - 1)] = e;
                    // If the current node is a red-black tree node
                    else if (e instanceof TreeNode)
                      	// Split the red-black tree into upper and lower trees
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // Nodes are linked list nodes
                    else { // preserve order
                      	// Build a high and low chain
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                      	// Put the low chain in the current position
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                      	// Put the high chain in the current position + expanded value
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

Here is an answer to the above analysis of the source code did not specifically expand the question

  1. The hash value is xOR obtained by the key’s hashcode() value 16 bits higher and 16 bits lower

    In order to reduce collisions and further reduce the probability of hash collisions, moving 16 bits to the right can retain the feature that the higher 16 bits are in the lower 16 bits. If it is ampersand or |, certain features will be lost, which can reduce collisions more

  2. Why does an integer multiple that is not 2 automatically become a multiple of 2 when passed in

    Because when the length of the array is a multiple of 2, when calculating the position of the key in the array (p = TAB [I = (n-1) & hash) where n-1 is represented in binary its mantissa is 1

    For example, if the array length is 16 and n-1 is 15, the binary representation is 0000 1111

    The hash value is determined by the hash value when ampersand is applied to the hash value

    For example, if there are two key-value pairs to be inserted, the hash value of A is 0000, 1001 the hash value of B is: 0000 1101, so that when the hash value is set to n-1&, their results are different, but if the mantissa of n-1 is not 1, such as 0000 1001, then the ampersand of A and B to n-1 will produce the same result, and the collision will occur

    Why is & adopted? If * is used to find the location in key and use % in capacity expansion, is it ok?

    Yes, of course, but in order to improve performance, so the use of bit operations, which can bring performance improvement

  3. Why did the expansion move to the left instead of *2

    Because bit operations are faster, it improves performance

disadvantages

The basic principles of HashMap are described above. Of course, we all know that HashMap is not thread safe, so what should we do when there is a concurrent scenario? Next time we will reveal the ConcurrentHashMap!

conclusion

So far we have understood the underlying structure of HashMap, the specific implementation of the PUT method and the author’s optimization points in some implementations. We can also learn the author’s code ideas in some places

I actually know the source will be very boring, very boring, but I feel or should stick to it, so can do much, know the why, the ability to see the source code will help improve your ability, when you do develop involuntary think of some ideas of the author, so as to enhance the capacity of your code I hope every developer can stick, together come on!

Finally, I would like to end this article with a sentence, work hard to refueling, after many years, you will thank yourself for having worked so hard!

I am he who loves writing code. See you next time!