This article has been authorized to wechat public account guolin_blog (Guo Lin) exclusive release

HashMap is the most commonly used type of Map and is often asked about it in interviews. Since the data structure of HashMap is relatively complex, it is often not satisfactory to answer the related questions. Especially after JDK1.8, the red-black tree structure is introduced, and the data structure becomes more complex. In this paper, the source code of JDK1.8 is taken as an example to analyze HashMap.

1. Source code analysis

1.1 The old rule, first on the construction method

  
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
 
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

  
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
     
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }
Copy the code

The constructor overloads four, initializing three parameters:

  • InitialCapacity initialCapacity (default 16) : HashMap underlying implementation by array + list (or red and black tree), but from the beginning of the array, so when more and more data stored must be expansion operation, if needed to store the data in the know the size of the case, specify the appropriate initial capacity, can avoid unnecessary expansion operation, improve efficiency
  • Threshold Threshold: Indicates the maximum number of value pairs that a hashMap can hold. If the number exceeds the threshold, the capacity needs to be expanded. The calculation method is as follows: Threshold =initialCapacity*loadFactor (tableSizeFor(initialCapacity)); The put method also assigns a new value to the threshold, which will be analyzed in the source code.)
  • LoadFactor (default 0.75) : when the loadFactor is high, there is less possibility to expand the table array, so it takes less memory (less space), but there are more elements in each entry chain, and the query time will increase (more time). On the other hand, when the load factor is smaller, the possibility of expanding the table array is higher, so the memory space is larger, but there are relatively fewer elements in the entry chain, and the detection time is also reduced. So the load factor is a compromise between time and space. Therefore, when setting the load factor, consider whether you are pursuing less in time or space. (In general, you do not need to set the default value, the system has been more suitable)

In this constructor, only the load factor is set as the default value. The other two parameters are initialized in the resize method. This conclusion is known here. Const tableSizeFor(int cap); const tableSizeFor(int cap); const tableSizeFor(int cap); const tableSizeFor(int cap); const tableSizeFor(int cap);

/** * Find greater than or equal tocap*/ static final int Izefor (intcap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

When I first saw the big method, I was like:

Next analysis of this method, for the unsigned right shift operator do not understand, you can look at this article to understand, the following steal a picture (really borrow someone else’s picture, Google search, do not know who, if the big guy feel too shameful, private message I I deleted him) to 10 for example analysis:

1.2 put method

The put method has the most complex logic in the hashMap source code. Let’s take a look at the source code:

  public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }
    
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if((TAB = table) = = null | | (n = TAB. Length) = = 0) / / if the table has not been initialized, is here to initialize array, and initial capacity assignment to recalculate the threshold n = (TAB = the resize ()). The length;if ((p = tab[i = (n - 1) & hash]) == null) // PasshashFind the subscript, ifhash= newNode(TAB [I] = newNode(hash, key, value, null);
        else{// If passedhashNode<K,V> e; K k;if (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) // if need to insert key and currenthashE = p; e = p;else if(p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeval (this, TAB, TAB)hash, key, value);
            else{// In this case, the bucket data type is a linked list // loopfor (int binCount = 0; ; ++binCount) {
                    if((e = p.ext) == null) {// If there is no newly inserted node in the list, put the new data at the end of the list p.ext = newNode(hash, key, value, null); // If the list is too long and reaches the tree threshold, convert the list to a red-black treeif (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break; } // If the list is not empty, then e is assigned to the value of the node, and the loop is brokenif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }} // If e is not null after the above loop, then the value inserted already exists in the current onehashMap, then updates the key-value pairs at the specified locationif(e ! = null) { // existing mappingfor key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);returnoldValue; } } ++modCount; // If this happenshashIf Map size is greater than the threshold, expand the capacityif (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

Copy the code

From the code, there are three cases of the PUT method:

  • The table is not initialized. Initialize the data

  • The table is initialized, and the hash algorithm is used to locate the subscript. The data is empty, and the data is directly stored in the specified location

  • The table has been initialized, and the location of the subscript found by the hash algorithm is not empty. A hash conflict (collision) occurs. After a collision, the following operations are performed:

    • If the inserted key is equal to the current key, point e to the key-value pair
    • If the data type in the bucket is treeNode, use a red-black tree to insert the bucket
    • If it is a linked list, the circular judgment is carried out; if the list contains this node, the loop is broken out; if the list does not contain this node, the node is inserted at the end of the list; at the same time, If the length of the linked list exceeds the TREEIFY_THRESHOLD (TREEIFY_THRESHOLD) and the table capacity exceeds the minimum treeify_capacity (MIN_TREEIFY_CAPACITY), the linked list is transformed into a red-black tree (because the smaller the table capacity is, the more likely hash conflicts will occur). Table size = MIN_TREEIFY_CAPACITY; table size = min_treeify_threshold; table size = MIN_TREEIFY_CAPACITY; list size = TREEIFY_THRESHOLD;

First examine the case where the table has not been initialized:

1.2.1 The table is not Initialized
n = (tab = resize()).length;
Copy the code

When the table is not initialized, the resize() method is called:

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; //1. Table has been initialized and its capacity is greater than 0if (oldCap > 0) {
            if(oldCap >= MAXIMUM_CAPACITY) {// If the old capacity is near the maximum value, the capacity is not expanded and the threshold is set to the maximum value. Threshold = integer.max_value;return oldTab;
            }
            else if((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) // If the old capacity is not less than the default initial capacity, expand the capacity. NewThr = oldThr << 1; newThr = oldThr << 1; //2, the threshold is greater than 0 threshold Use the threshold variable to temporarily store the value of the parameter initialCapacityelse if (oldThr > 0) // initial capacity was placed inthreshold newCap = oldThr; //3 The constructor does not assign values to threshold and initial capacity. // the constructor does not assign values to threshold and initial capacityelse{// The I so initial threshold so using defaults // If the threshold is zero, the default initialization value is used // This case occurs when no-argument constructions are called, In this case, the default capacity and threshold newCap = DEFAULT_INITIAL_CAPACITY are used. Threshold =initialCapacity*loadFactor newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // When newThr is 0, the value is calculated according to the threshold calculation formula, capacity * load factorif (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // Update the threshold threshold = newThr; // Update the bucket @suppresswarnings ({"rawtypes"."unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; // If the array bucket already has data in it, and the table capacity has changed,hashThe value also changes, and the subscript needs to be recalculatedif(oldTab ! = null) {for(int j = 0; j < oldCap; ++j) { Node<K,V> e; // If the specified subscript has dataif((e = oldTab[j]) ! = null) {oldTab[j] = null; //2, the specified index is only one dataif(e.ext == null) // Store the data directly to the newly computedhashNewTab [e.ash & (newcap-1)] = e; //3, if the TreeNode data structureelse if(e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); //4. For a linked list, the data structureelse{// preserve order //hash<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next;do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null);if(loTail ! = null) { loTail.next = null; newTab[j] = loHead; }if(hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } }return newTab;
    }
Copy the code

The logic of resize method is complicated, so it needs to calm down and analyze step by step. However, in general, it can be divided into the following steps:

  • If the table has not been initialized, the problem that threshold and initialCapacity are not initialized is solved here. If they have been initialized, the capacity is expanded to double the original capacity
  • After capacity expansion, create a new table and iterate over all the data
    • If the newly computed location data is empty, insert directly
    • If the newly calculated position is a linked list, the hash algorithm is used to recalculate subscripts and group the list
    • If it is a red-black tree, you need to split it

1.3 Get method, search

After the put method is analyzed, the rest is simple. First look at the source code:

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash])! = null) {//1hashThe algorithm finds the first data in the corresponding position. If it is a specified key, it returns directlyif (first.hash == hash&& // always check first node ((k = first.key) == key || (key ! = null && key.equals(k))))return first;

            if((e = first.next) ! = null) {// If the object is a red-black tree, search by treeif (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key); // If the node is a linked list, the traverse finds the datado {
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

Copy the code

The logic of get is much simpler than put

  1. The data in the specified location is found based on the hash value
  2. Check whether the data of the first node at the specified location is the key passed in. If so, return the first node directly, otherwise continue to look for the second node
  3. If the data is a TreeNode (red-black tree structure), look up the node data directly through the red-black tree and return
  4. If it is a linked list structure, loop through all nodes and return the data
  5. If no matching node is found, null is returned

There are two things to note in this method: the hash (key) and the modulus of the hash (n-1) & hash

1.3.1 Hash (Key) source code
 static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

This code is called the perturbation function, which is also the hash operation in the hashMap. It is divided into the following steps:

  • Key.hashcode (), which gets the key’s hashCode value and returns an int based on the memory address if not overridden
  • The hashCode obtained by key.hashcode () is shifted unsigned 16 bits to the right and ^ with the meta hashCode. The purpose of this is to mix the high and the low, allowing both to participate in the computation, so that the hash values are more evenly distributed
1.3.2 Modulus operation (n-1) & Hash

In the hashMap code, you’ll see similar code in many places:

first = tab[(n - 1) & hash]) 
Copy the code

In order to make the element distribution more uniform, many hash algorithms use modulo operation. Instead of using hash%n for modulo operation in hashMap, (n-1) & hash is used instead. The reason is that the efficiency of & is much higher than that of % in computer. Note that (n-1) & hash can be equivalent to hash%n only if the capacity is 2 to the NTH power. This is also the reason that when hashMap initializes the initial capacity, any value passed in will be converted to the NTH power of 2 by using the tableSizeFor(int Cap) method. The ingenuity of the design is really amazing; As to why only the NTH power of 2 is able to do modulo operation in this way, I will not go into detail here. For those who are interested in this article, see the following article: The Sum of % remainder and the Transformation of the HashMap Algorithm

1.4 Remove method: Delete

After the get method, let’s take a final look at the remove method:

  public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false.true)) == null ?
            null : e.value;
    }
    
    final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; // According to key and keyhashValue to find the corresponding elementif((tab = table) ! = null && (n = tab.length) > 0 && (p = tab[index = (n - 1) &hash]) != null) {
            Node<K,V> node = null, e; K k; V v;
            if (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) node = p;else if((e = p.next) ! = null) {if (p instanceof TreeNode)
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else {
                    do {
                        if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k)))) { node = e;break;
                        }
                        p = e;
                    } while((e = e.next) ! = null); } // If the element node is found, remove itif(node ! = null && (! matchValue || (v = node.value) == value || (value ! = null && value.equals(v)))) {// If it is a TreeNode, remove it by treeif(node instanceof TreeNode) ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); // If it is the first node, remove the first node and point the index to the second nodeelse if (node == p)
                    tab[index] = node.next;
                else// If it is not the first node in the list, remove the node p.ext = node.next; ++modCount; --size; afterNodeRemoval(node);returnnode; }}return null;
    }
Copy the code

It can be seen from the source code that the operation process of finding the element to be removed through key is almost the same as the get method. Finally, after finding the node corresponding to key, the corresponding removal operation is completed according to the location and type of the node. The process is very simple

1.4.0 Other source code

For example, containsKey (jey) uses getNode () in the get method to judge it. For reasons of space, I will not describe it in detail.

In addition, there are many parts of the code that do not affect the logical understanding, such as red black tree transformation, search, delete and other operations, interested in learning, but there are a few other features need to be reminded

To sum up:

  • Before JDK1.7, the underlying data structure of HashMap was composed of arrays and linked lists. After 1.8, red-black trees were added. List length less than 8, after the Hash collision will increase the length of the list, when the list length is more than 8, will first interpretation of the array capacity, if the capacity is less than 64 will be expanded, the reason is array capacity is smaller, the more prone to collisions, so when the capacity is too small, the first thing to consider is expansion), if the capacity is more than 64, Will convert the list into a red-black tree to improve efficiency
  • The capacity of a hashMap is 2 to the NTH power. Whatever capacity is passed in when it is initialized, it will be converted to the NTH power of 2. The reason for this is to use the & operator instead of % mod when modulo operations are performed, which greatly improves efficiency and reduces hash collisions
  • HashMap is non-thread-safe and can be used in the operation of multiple threads with exceptions (such as closed loop (1.7), closed loop fixed in 1.8 but still unsafe). Instead, use HashTable or ConcurrentHashMap