Art is long, life is long

Preface:

HashMap has always been a hot topic, as long as you go out to interview, you will definitely need it!

This paper mainly combines JDK1.7 and JDK1.8 differences, HashMap data structure and implementation functions, in-depth discussion.

Brief introduction:

In programming time, HashMap is a very frequently used container class that allows both keys and values to be placed in null elements (but only one of the key positions is NULL).

Unlike TreeMap, this container does not guarantee the order of elements. The container may rehash elements as needed, and the order of elements will be rebroken, so iterating over the same HashMap at different times may be in different order.

The HashMap is essentially a hash array structure, with the possibility of hash collisions when elements are inserted:

There are two ways to resolve conflicts:

  • Open address (when a hash conflict occurs, keep looking until you find a hash value that doesn’t collide),

  • The zipper approach (putting conflicting elements into a linked list) is the second approach used in a HashMap, the zipper approach.

    In jdk1.7, a HashMap consists primarily of an array + a linked list, and when a hash conflict occurs, the conflicting elements are put into the linked list.

    Since JDK1.8, HashMap has been implemented primarily by arrays + linked lists + red-black trees, one more red-black tree implementation than IN JDK1.7.

Add: Before converting a linked list to a red-black tree, it will judge that even if the threshold is greater than 8 but the array length is less than 64, it will not turn the list into a red-black tree, but choose array expansion instead. And the whole point of doing that is because the array is small, so if you want to avoid red black trees, if you want to change to red black trees, it’s inefficient, because red black trees need to do things like left rotation, right rotation, and color change to maintain balance. When the array length is less than 64, the search time is relatively faster. In order to improve performance and reduce search time, the linked list will be converted into a red-black tree only when the threshold value is greater than 8 and the array length is greater than 64. For details, please refer to the treeifyBin method.

final void treeifyBin(Node<K, V>[] tab, int hash) {
    int n, index;
    Node<K, V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        // If the number of buckets is smaller than 64, expand the buckets directly
        // After expansion, the linked list will split into two linked lists to reduce the number of elements
        // For example, if the volume is 4, all the elements divided by 4 have a remainder of 3
        // This does not reduce the length of the list
        resize();
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        TreeNode<K, V> hd = null, tl = null;
        // Replace all nodes with tree nodes
        do {
            TreeNode<K, V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        // If you enter the loop above, tree from the beginning node
        if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code

A HashMap features:

  • Stored disorderly
  • Both key and value positions can be NULL, but only one key position can be NULL
  • Unique to key positions, controlled by the underlying data structure
  • Since jdk1.8, HashMap has been implemented primarily by arrays + linked lists + red-black trees
  • Lists are converted to red-black trees only when the threshold is greater than 8 and the array length is greater than 64

Source code analysis:

To open source analysis of HashMap, there are five key parameters:

  • Table: an array of hash buckets in which key-value pairs are stored.

  • Size: indicates the actual number of key-value pairs.

  • ModCount: records the modification times.

  • Threshold: indicates the maximum number of key-value pairs that a container can contain.

  • LoadFactor: indicates the loadFactor.

    transient Node<K,V>[] table;

    transient int size;

    transient int modCount;

    int threshold;
 
    final float loadFactor;
Copy the code

As you know, a HashMap is a collection of key-value pairs, each of which is also called a Node (Entry in Java7, Node in Java8).

These key-value pairs (Nodes) are distributed in an array that is the backbone of a HashMap.

Each Node stores its own hash, key, value, and the next Node.

    static class Node<K.V> implements Map.Entry<K.V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
Copy the code

In the data structure of a HashMap, there are two parameters that affect the performance of a HashMap: inital capacity and load factor.

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
Copy the code

Inital capacity is the initial length of the table (the default value is 16).

Load factor refers to the threshold for automatic expansion (the default value is 0.75).

Threshold is the maximum number of nodes (key-value pairs) that a HashMap can contain. The calculation formula is threshold = capacity x Load factor.

When the number of entries exceeds capacity*load_factor, the container automatically expands and hashes again. The expanded HashMap capacity is twice the previous capacity, so the array length is always 2 to the power of n.

There are many internal implementations of HashMap functionality:

  • Get the array subscript by K;
  • The put method;
  • Resize Expansion process;
  • The get method;
  • The remove method;

Get the array subscript from K:

No matter adding, deleting, or searching for key-value pairs, locating the array position is the first crucial step. Open any adding, deleting, or searching method of hashMap. It can be seen from the source code that there are three main operations to obtain the array subscript through key, where length refers to the size of the container array.

Source code:

/** Get the hash value method */
static final int hash(Object key) {
     int h;
     // h = key.hashcode () for the first step
     // h ^ (h >>> 16)
     return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);/ / jdk1.8
}
/** Get array subscript method */
static int indexFor(int h, int length) {
	//jdk1.7 source code, JDk1.8 does not have this method, but the implementation principle is the same
     return h & (length-1);  // The third step is the modulo operation
}
Copy the code

The put method:

The initial value for each element of the HashMap array is Null.

What happens when YOU call the put method?

For example, call hashmap.put (“apple”, 0) and insert an element with a Key of “apple”. In this case, we need to use a hash function to determine the Entry position (index) :

Index = Hash (” apple “)

Assuming the final index is 2, the result is as follows:

However, the length of a HashMap is limited. As more and more entries are inserted, the perfect Hash function will inevitably have index conflicts. Like this:

What to do at this point? We can solve this with linked lists.

Each element of the HashMap array is not only an Entry object, but also the head node of a linked list. Each Entry object points to its Next Entry node through the Next pointer.

When a new Entry is mapped to a conflicting array location, just insert it into the corresponding linked list:

  • Java8 was preceded by header interpolation, which meant that the new value would replace the original value, and the original value would be pushed down the list, as in the example above, because the author of the code thought that the later value would be more likely to be found, making the lookup more efficient.

  • After java8, all use tail inserts.

As for why tail interpolation was adopted after java8?

For example, we now put two values to a 2 size put with a load factor of 0.75 will we resize when we put the second one?

2*0.75 = 1 so insert the second resize

Now we’re going to insert A, B, and C in A container of capacity 2 with different threads, so if we put A short dot before resize, that means we’ve inserted all the data but we haven’t resized yet so that might have been the case before capacity expansion.

We can see that the list points to A->B->C

The next pointer to A is to B

Because the assignment method of resize uses the head insertion method of single linked list, new elements in the same position will always be placed in the head position of the linked list. After recalculating the index position of the same Entry chain in the old array, it may be placed in different positions of the new array.

The following could happen. Do you see the problem?

The next pointer to B points to A

Once several threads have adjusted, a circular linked list can occur

And if you evaluate it at this point, you get an Infinite Loop.

Since java8 has a red-black tree, you can see that the code has a lot more if and else logic. The introduction of red-black trees has cleverly reduced the time complexity from O(n) to O(logn).

Using header inserts will change the order of the list, but using tail inserts will keep the original order of the list elements during expansion, and the list will not be looped.

So it used to be A->B, but it’s still A->B

Java7 may cause an infinite loop when multithreading HashMap operations. The reason is that the order of the linked list is reversed after the expansion and the reference relationship of the nodes in the original linked list is changed during the migration.

Java8 does not cause an infinite loop on the same premise, because the order of the linked list remains the same after expansion and migration, and the reference relationship between the previous nodes remains.

Does that mean that Java8 can use HashMap in multithreading?

Even if you don’t have an infinite loop, you can see in the source code that there is no synchronization lock in the put/ GET method. The most common problem with multithreading is that the last put value is not guaranteed, and the next get value is the same, so the thread safety is not guaranteed.

Resize Expansion process:

Before we talk about HashMap dynamic capacity expansion in jdk1.8, let’s take a look at HashMap dynamic capacity expansion implementation in jdk1.8, because the code implementation of jdk1.8 is more than twice as complex as that of Java1.7.

Jdk1.7 expansion implementation

The source code section

/** * JDK1.7 Expansion method * Pass in new capacity */
void resize(int newCapacity) {
    // Reference the Entry array before expansion
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
	// The size of the array before the expansion has reached the maximum (2^30)
    if (oldCapacity == MAXIMUM_CAPACITY) {
		// Change the threshold to the maximum value of int (2^31-1) so that the capacity will not be expanded later
        threshold = Integer.MAX_VALUE;
        return;
    }
	// Initialize a new Entry array
    Entry[] newTable = new Entry[newCapacity];
	// Move the data to the new Entry array, which contains the most important repositioning
    transfer(newTable);
	// The table property of the HashMap references the new Entry array
    table = newTable;
    threshold = (int) (newCapacity * loadFactor);// Change the threshold
}
Copy the code

The reference to newTable[I] is assigned to e.ext, which uses the single-linked header insertion method. New elements in the same position are always placed at the head of the list. Unlike Jdk1.8, elements placed first on an index will end up at the end of the Entry chain if a hash conflict occurs. Elements in the same Entry chain in the old array may be placed in different positions in the new array after recalculation of index positions.

Jdk1.8 expansion implementation

The source code is as follows:

final Node<K,V>[] resize() {
	// Reference the node array before capacity expansion
        Node<K,V>[] oldTab = table;
	// Old capacity
        int oldCap = (oldTab == null)?0 : oldTab.length;
	// Old threshold
        int oldThr = threshold;
	// The new capacity and threshold values are initialized to 0
        int newCap, newThr = 0;
        if (oldCap > 0) {
		// If the old capacity exceeds the maximum capacity, set the threshold to the maximum capacity
                threshold = Integer.MAX_VALUE;
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
	    // If the value is not higher than the maximum value, the value is doubled
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
		// If the old capacity does not exceed the maximum value and the old capacity is not less than 16, the old capacity is doubled
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
	    // Initialize the capacity to the threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
	    // Use the default value when 0
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
	// Calculate a new threshold. If the new capacity or the new threshold is greater than or equal to the maximum capacity, use the maximum capacity as the threshold
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
	// Set a new threshold
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
	// Create a new array and reference it
        table = newTab;
	// If the old array has data, i.e. expansion and not initialization, execute the following code, otherwise the initialization can end at this point
        if(oldTab ! =null) {
	    // Polls all data in the old array
            for (int j = 0; j < oldCap; ++j) {
		// Reference the current node with a new node, and then release the reference to the original node
                Node<K,V> e;
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
		    // If e does not have a next node, to prove that there is no hash conflict on the node, the reference to e is directly assigned to the new array location
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
			/ /!!!!!! If it's a red-black tree, it splits
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else {
		        // List optimizes code blocks for rehash
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
			If the bit added to the current element is 0, it will be placed on the current list. If it is 1, it will be placed on the position "j+oldcap" to generate the "low" and "high" lists
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
				 // Elements are continuously added to the end, not in reverse order as in 1.7
                                    loTail.next = e;
				// The new element is always the last element
                                loTail = e;
                            }
                            else {
				// The high-order list uses the same logic as the low-order list, continually adding elements to the end of the list
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
			// place the low list at index j
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
			// place the high-order list at the index position (j+oldCap)
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
}
Copy the code

The get () method:

The get(Object key) method returns the corresponding value based on the specified key value, getNode(hash(key), key) returns the corresponding Node Object E, and then returns E.value. So getNode() is the heart of the algorithm.

Source code:

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
Copy the code

Get Node method by hash value and key.

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            // check whether the first element matches the key
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
            if((e = first.next) ! =null) {
                // check whether the linked list is a red-black tree
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // if it is not a red-black tree structure, it is directly judged by loop
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null); }}return null;
    }
Copy the code

Find TreeNode with k specified in red black tree.

/** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** *
final TreeNode<K,V> find(inth, Object k, Class<? > kc) {
            TreeNode<K,V> p = this;
            do {
                int ph, dir; K pk;
                TreeNode<K,V> pl = p.left, pr = p.right, q;
		// Determine if the hash of the element to be queried is on the left side of the tree
                if ((ph = p.hash) > h)
                    p = pl;
		// Determine if the hash of the element to be queried is on the right side of the tree
                else if (ph < h)
                    p = pr;
		// Query the element hash to be the same as the current tree node hash
                else if((pk = p.key) == k || (k ! =null && k.equals(pk)))
                    return p;
		// The above three steps are normal methods of finding objects in the binary search tree
		// If hash is equal, but content is not
                else if (pl == null)
                    p = pr;
                else if (pr == null)
                    p = pl;
		// Compare compareTo if you can
                else if((kc ! =null|| (kc = comparableClassFor(k)) ! =null) && (dir = compareComparables(kc, k, pk)) ! =0)
                    p = (dir < 0)? pl : pr;// Continue query on right child based on compareTo result
                else if((q = pr.find(h, k, kc)) ! =null)
                    return q;
		// Continue query on left child based on compareTo result
                else
                    p = pl;
            } while(p ! =null);
            return null;
}
Copy the code

Get method, first through the hash() function to get the corresponding array index, and then determine the sequence.

  • 1. Check whether the first element matches the key. If so, return the parameter value.
  • 2. Determine whether the linked list is a red-black tree. If so, enter the red-black tree method to obtain parameter values;
  • 3. If it is not a red-black tree structure, the judgment loop is performed directly until the parameters are obtained;

Remove () method:

Remove (Object key) removes the Node corresponding to the key value. The logic of this method is implemented in removeNode(Hash (key), key, NULL, false, true).

The delete logic in JDK1.8 is more complex than in JDK1.7, with more red and black tree node deletion and adjustment:

  1. The default is to determine whether the first element in the list is the element to be deleted.
  2. If the first conflict list is not a red black tree, continue to determine whether the current conflict list is a red black tree, if so, enter the red black tree to find;
  3. If the current conflict list is not a red-black tree, it will loop through the list until it is found.
  4. Delete the nodes found. If it is a red-black tree structure, color conversion, left-rotation, right-rotation adjustment will be carried out until it meets red-black tree characteristics.

Conclusion:

(1) HashMap is a hash table, which adopts the storage structure of array + linked list + red-black tree;

(2) The default initial capacity of HashMap is 16 (1<<4), the default load factor is 0.75 F, and the capacity is always 2 to the power of n;

(3) When HashMap is expanded, its capacity is doubled each time;

(4) When the number of buckets is less than 64, the tree is not implemented, but the capacity is expanded.

(5) When the number of buckets is greater than 64 and the number of elements in a single bucket is greater than 8, tree is performed;

(6) When the number of elements in a single bucket is less than 6, anti-tree is carried out;

(7) HashMap is a non-thread-safe container;

(8) The time complexity of HashMap searching for added elements is O(1);