This article directory

  • The design idea of HashMap
    • The underlying structure of a HashMap
    • Why not just use the HashMap structure in the first place
    • Why does a Map become a red-black tree when it has more than eight nodes
    • A HashMap initialization
    • Why does a HashMap initialize capacity to an integer power of 2
  • Why isn’t HashMap thread safe
    • At the same time, the PUT collision causes data loss
    • The value taken out during capacity expansion is incorrect
  • The difference between HashMap in java7 and java8
    • Comparison of underlying data structures
    • Comparison of insertion modes
    • Comparison of Capacity Expansion Methods
  • ConcurrentHashMap difference between Java7 and Java8
    • The data structure
    • Concurrent degree
    • Encounter a Hash collision

The design idea of HashMap

The underlying structure of a HashMap

This article is mainly to explain the jdK1.8 HashMap source, will do some simple jdK1.7 HashMap to compare with jdK1.8 HashMap.

To understand the underlying structure of a HashMap:

First of all, we can see that the HashMap structure has the following characteristics: You start with a Map of space and space, and then when you put elements in it, you compute the hash value, and when you compute the hash value, the first value will occupy a bucket (slot), and then the next value with the same hash value will continue in a linked list after that slot. That’s the zipper method.

When the length of the list is greater than or equal to the threshold (8 by default) and the current HashMap capacity is greater than or equal to MIN_TREEIFY_CAPACITY(64 by default), the list is converted to a red-black tree structure. If the list is subsequently resized for deletion or other reasons, When the number of nodes in a red-black tree is less than or equal to 6, the linked list structure is restored.

Why not just use the HashMap structure in the first place

The official explanation:

Because TreeNodes are about twice the size of regular nodes, 
we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD).
 And when they become too small (due to removal or resizing) they 
are converted back to plain bins.
Copy the code

Translation: Since TreeNodes are about twice the size of normal Nodes, Nodes are converted to TreeNodes only when there are enough Nodes, which is determined by TREEIFY_THRESHOLD. When the number of nodes in the bucket is reduced due to removal or resizing, they are converted back to the normal linked list structure to save space.

According to the source code, when the length of the linked list reaches 8, it will be converted into a red-black tree structure, and when the tree node is less than or equal to 6, it will be converted back, which reflects the idea of balance between time and space.

Why does a Map become a red-black tree when it has more than eight nodes

The official explanation for the problem is:

In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a Parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, The expected occurrences of list size k are (exp(-0.5) * POw (0.5, k)/factorial(k)). The first values are: 0: 0.60653066 1: 0.30326533 2: 0.07581633 3: 0.01263606 4: 0.00157952 5: 0.00015795 6: 0.00001316 7: 0.00000094 8: 0.00000006 More: less than 1 in ten millionCopy the code

What it means: When using distributed good user hashCodes, rarely used the red-black tree structure, because in an ideal world, a linked list of nodes frequency follows the poisson distribution (which means each shot decreasing the length of the list, when hit 8 times the length of the list is eight, at this point the probability of only 0.00000006, less than one over ten million.

However, determining which bucket an element falls into in a HashMap is related to the hashCode of an object. If we define a very uneven hashCode ourselves, for example:

public int hashCode(){
    return 1;
}
Copy the code

Since the above hashCode method returns all hash values of 1, the linked list in the HashMap is longer than that. If we add many nodes to the HashMap, the HashMap will be converted to a red-black tree structure. Therefore, if the length of the linked list exceeds 8, the design of converting it into a red-black tree is more to prevent users from implementing a bad hash algorithm, which will lead to an excessively long linked list and affect the query efficiency. At this time, the conversion of red-black tree structure is used to ensure the query efficiency in extreme cases.

A HashMap initialization

The default initialization size of a HashMap is 16, the loading factor is 0.75, and the threshold for expansion is 12 (16*0.75). If the HashMap is initialized with the custom size parameter size, then the initialization size is exactly greater than the integer power of 2 of size, such as 10. Size is 16, passed 30 size is 32, source code as follows:

static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; / / n > > > 1 n binary to the right one, the following in the same way, and then with mobile n for xor operation before n | = n > > > 2. n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }Copy the code

In the above source code, by continuously moving the first 1 in the high n to the right, and then changing all the ones after the high N to 1, and returning n+1 at the end of the return, it is consistent with the HashMap capacity is an integer power of 2. For example:

Why does a HashMap initialize capacity to an integer power of 2

TableSizeFor = tableSizeFor = tableSizeFor = tableSizeFor = tableSizeFor = tableSizeFor Let’s look at the hash method for the key as follows:

// Computes the hash value of the key. The hash value is a 32-bit int, evaluated by ampersands of 16 bits high and 16 bits low. static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }Copy the code

After obtaining the hash value of the key, calculate the position index of the key in the table as follows:

if ((p = tab[i = (n - 1) & hash]) == null)
Copy the code

Precisely because n is an integer power of 2, for example, when n is 2, its binary is 10, when 4, 100, when 8, 1000, and when 16, 10000…. The binary of (n-1) is 1, (4-1) is 11, (8-1) is 111, and (16-1) is 1111. Why use (n-1) &hash to calculate the position index of the array? This is because the index of (n-1) &hash must be somewhere between 0 and (n-1), regardless of the hash value, because the result of the and operation is that the high part of the hash value is zero, and only the low part of the hash value is kept for array subscript access. For example, if the initial length is 16, the value is 15. The value is 00000000 00000000 00001111 in base 2. The “and” operation on a hash value is as follows, and the result is the lowest four-digit value truncated.

Why isn’t HashMap thread safe

Put (); put ();

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; / / if the table capacity is empty, it is to initialize the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // If the hash value is empty, insert the element directly into the hash value. // If the hash value is empty, insert the element directly into the hash value. if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; // Check whether the same key exists in the TAB table. if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // Check whether it is a red-black tree node, Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); For (int binCount = 0; // insert a new value into the end of the list. ; If ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); If (binCount >= treeify_threshold-1) // -1 for 1st treeifyBin(TAB, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; } // If (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; ++modCount; ++modCount; // If it is greater than the threshold (this threshold is not the same as the above threshold, this is equal to the current capacity * load factor, default is 16*0.75 = 12), expand. if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

++modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount = +modCount Read, add, save, and each step can be interspersed with execution by other threads, making it thread unsafe.

Other things that can make a thread unsafe are:

At the same time, the PUT collision causes data loss

If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null);Copy the code

Such as the source above, if there are two threads to A and B, their key hash value at the right moment, and then execute simultaneously to this if they judge, and I find the TAB position is empty, the thread A would own elements on the location, but before the thread B is to recognize the position is empty, So it puts its own element after THREAD A, overwriting the element of thread A before it. This is A scenario where multiple threads simultaneously put collisions causing data loss, so A HashMap is not thread-safe

The value taken out during capacity expansion is incorrect

HashMap get ();

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; Return null if ((TAB = table)! If (TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {// Check whether the first node is the one we want, If it is returned directly if (first. The hash = = hash && / / always check first node ((k = first. Key) = = key | | (key! = null && key.equals(k)))) return first; if ((e = first.next) ! = null) {// Check whether it is a red-black tree node, if so, If (first instanceof TreeNode) return ((TreeNode<K,V)first).getTreenode (hash, key); / / traverse the list, find the key do {the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

The above get method source can be seen in the following steps:

  1. Calculates the Hash value and finds the corresponding slot.
  2. If the array is empty or the location is NULL, then simply return NULL.
  3. If the node at that position is exactly what we want, we return the value of that node.
  4. If the location node is a red-black tree or is being expanded, use find to continue the search.
  5. Otherwise it’s a linked list, so we’re going to do a list lookup.

First of all, the get method of HashMap is to check whether the key we are looking for exists in the table, and return if it does exist, and return null if it does not exist. So if it is in the expansion period, why is the result obtained inaccurate? Resize (); resize();

if (oldTab ! = null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! OldTab [j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; ... // ignore}}}Copy the code

The above source code is a small part of the resize method of the HashMap. First, we know that the expansion of the HashMap migrates the data from the old array to the new array. During the relocation process, the bucket that the old array is migrating is emptied. As oldTab[j] = null, the bucket (or slot) of index J is empty, but if all data migration has not been completed, the old array is still used in the HashMap. At this time, the index position of the key queried by the get method in the old array is oldTab[j], because this position has been empty, so it will return null, so the data read during capacity expansion is not accurate.

The difference between HashMap in java7 and java8

Comparison of underlying data structures

Java7’s HashMap looks like this:

In the figure above, we can see that the data structure of HashMap in JDk1.7 is array + linked list.

The HashMap structure in Java8 is shown as follows:

As you can see from the figure, HashMap in Java8 has three data structures:

  1. The first structure is an array. The empty slots in the array indicate that there is no data to fill in
  2. The second is a zipper structure similar to jdK1.7 HashMap, in which each slot is filled with the first node and then extended as a linked list if the same Hash value is computed
  3. The third is the red-black tree structure, which is not found in the HashMap in java7. When the length of the list in the second case is greater than a certain threshold (default is 8), HashMap converts the list from the structure of the list to the form of red-black tree to ensure efficiency of lookup.

Here is a brief introduction to red-black tree. Red-black tree is a kind of non-strict balanced binary search tree, which mainly solves the performance degradation caused by dynamic update of binary search tree. The balance means approximate balance, which is equivalent to the performance will not degrade. Nodes in red-black trees are divided into two types: black nodes and red nodes. A red-black tree needs to meet the following requirements:

  1. The root node is black.
  2. Each leaf node is a black null node.
  3. No neighboring nodes can be red at the same time, that is, red nodes are separated by black.
  4. Each node, and all paths from that node to its reachable leaf node, contains the same number of black nodes.

Due to the self-balancing characteristics of red-black tree, its search performance is similar to binary search, and the time complexity is O(log(n)). Compared with the linked list structure, if the worst happens, it may need to traverse the whole linked list to find the target element, and the time complexity is O(n), much larger than the O(log(n)) of red-black tree. Especially if there are more and more nodes, order log of n.

Comparison of insertion modes

In JDK1.7, a new node is inserted into the array using the header lookup method, that is, if a new node comes, the new node is inserted into the array, and the original node in the original array is the successor node of the new node, and it is determined whether to expand the capacity before insertion.

In JDK1.8, a new node is inserted into the end of the corresponding slot in the array. The new node is inserted first to determine whether expansion is needed.

Comparison of Capacity Expansion Methods

Resize (); resize(); resize();

// loHead = loHead; loTail = hiHead = hiHead; hiTail = high; The capacity expansion method is either the original location remains unchanged or the location is changed to index + the old capacity size. Therefore, the expanded capacity is divided into two parts. // One part is the original capacity size, and loHead and loTail are used to represent the nodes at the beginning and end of the capacity. Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; Do {// Next = e.next; If ((e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; Else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // Insert position unchanged if (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // Insert position becomes old array size + old array index if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; }Copy the code

The HashMap in JDK1.7 needs to rehash the elements in the original array to locate them in the new array when expanding the HashMap. The source code of the transfer method is as follows:

//jdk1.7HashMap expansion method void resize(int newCapacity) {Entry[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } Entry[] newTable = new Entry[newCapacity]; // Transfer () transfers the values from the original array to the new array, The Boolean value returned by initHashSeedAsNeeded(newCapacity) is used to determine whether to rehash transfer(newTable, InitHashSeedAsNeeded (newCapacity)); // Set hashMap to reference new array table = newTable; // Set the new threshold for hashMap capacity expansion. Threshold = (int) math. min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); } void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry<K,V> e : table) { while(null ! = e) {// Save the successor node of e. Entry<K,V> next = e.next; // Rehash position in the new array if (rehash) {e.hash = null == e.key? 0 : hash(e.key); } // Calculate the position index of e in the new array int I = indexFor(e.dash, newCapacity); E.next = newTable[I]; NewTable [I] = e; newTable[I] = e; // Pay next to e to start the next loop e = next; }}}Copy the code

At the same time, because the header interpolation method is used in the transfer method called in the expansion method of HashMap in JDK1.7, the header interpolation method will reverse the linked list, and a circular linked list will be generated in the multi-threaded environment. The above code is combined with the figure to understand the header interpolation method and why the circular linked list is generated in the multi-threaded environment: First, assume that there is an internal data store of the original HashMap table as shown below:

The new array (newTable) for thread 1 and thread 2 is shown as follows:

Assume that thread 2 executes the Transfer method first and that it completes Entry

next = e.next; After this line of code, the CPU time slice is switched to thread 1 for execution, and thread 1 completes the transfer method, as shown in the following figure: Thread 1 inserts node A into its new table
,v>

Thread 1 inserts node B into its new table

Thread 1 inserts node C into its new table

When thread 1 completes the transfer method, the line of code table = newTable has not been executed. At this time, the CPU time slice has been switched to thread 2, so thread 2 continues to execute at the previous position. It should be noted here that thread 2 has completed the code of Entry

next = E.next before switching thread 1. Therefore, variable E in thread 2 stores node A and variable next stores node B. Moreover, after thread 1 completes the transfer method, As can be seen from the figure above, C points to B and B points to A. Therefore, when switching to thread 2, the node in thread 2 points to the following figure:
,v>

Thread 2 inserts node A into its new table

Thread 2 inserts node B into its new table

Since node B points to node A, the next insertion generates A linked list loop, as shown below:

To sum up, this is the scenario where a HashMap uses header interpolation to generate a linked list loop.

In JDk1.8, the header interpolation method has been changed to the tail interpolation method. The tail interpolation method has been marked in the above expansion method code, so that the linked list loop can be avoided in multi-threaded environment due to the inversion of the list.

ConcurrentHashMap difference between Java7 and Java8

The data structure

Let’s take a look at the underlying data structure of ConcurrentHashMap in jdk1.7:

The Segment inherits ReentrantLock, which can be understood as a lock. Each Segment is locked independently of each other and does not affect each other. This is much more efficient than using HashTable to lock the entire object for every operation, precisely because concurrentHashMap’s lock granularity is for each Segment rather than the entire object.

The underlying data structure of each Segment is similar to that of HashMap, which is a zipper structure composed of arrays and linked lists. By default, there are 16 segments ranging from 0 to 15 in total, so up to 16 threads can operate concurrently (each thread is distributed on a different Segment). 16 The default value can be set to another value during initialization. Once the setting is confirmed, the capacity cannot be expanded.

The underlying data structure of ConcurrentHashMap in JDK1.8 is as follows:

As can be seen from the above two figures, the ConcurrentHashMap in JDK1.8 has more red-black tree structure in data structure than that in JDK1.7. The red-black tree structure is introduced to prevent the query efficiency from decreasing.

Concurrent degree

Java7 ConcurrentHashMap (); java7 ConcurrentHashMap (); java8 ConcurrentHashMap ();

// With the specified capacity, ConcurrentHashMap public ConcurrentHashMap(int initialCapacity, float loadFactor, Int concurrencyLevel) {if (! (loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0) throw new IllegalArgumentException(); If (concurrencyLevel > MAX_SEGMENTS) concurrencyLevel = MAX_SEGMENTS; Int sshift = 0; Int ssize = 1; While (ssize < concurrencyLevel) {++sshift; // power of 2,2^ 1,2^2.... ssize <<= 1; } this.segmentShift = 32 - sshift; this.segmentMask = ssize - 1; If (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; Int c = initialCapacity/ssize; If (c * ssize < initialCapacity) ++c; int cap = MIN_SEGMENT_TABLE_CAPACITY; while (cap < c) cap <<= 1; // Create the first Segment and place it in the Segment[] array. New Segment<K,V>(loadFactor, (int)(cap * loadFactor), (HashEntry<K,V>[])new HashEntry[cap]); [] ss = (Segment<K,V>[])new Segment[ssize]; UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0] this.segments = ss; }Copy the code

The concurrencyLevel in JDk1.7 is determined by the number of segments. The number of segments is an integer greater than or equal to the first 2 ^ N of the concurrencyLevel. For example, if concurrencyLevel is 12,13,14,15,16, the number of segments is 16

The source summary in Java8 preserves the Segment but does not use it. The constructor is as follows:

public ConcurrentHashMap(int initialCapacity, float loadFactor, int concurrencyLevel) { if (! (loadFactor > 0.0 f) | | initialCapacity < 0 | | concurrencyLevel < = 0) throw new IllegalArgumentException (); // If the concurrency level is greater than the initial capacity, the initial capacity is limited, so that each slot corresponds to one thread, then ideally, If (initialCapacity < concurrencyLevel) // Use at least as many bins initialCapacity = concurrencyLevel; // as estimated Threads Long size = (long)(1.0 + (long)initialCapacity/loadFactor); // As estimated Threads Long size = (long)(1.0 + (long)initialCapacity/loadFactor); int cap = (size >= (long)MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : tableSizeFor((int)size); this.sizeCtl = cap; }Copy the code

A conclusion can be drawn from the comparison of the above codes:

  1. Java7 uses Segment locking to ensure security. Each Segment is locked independently. The maximum number of concurrent segments is the number of segments.

  2. In Java8, the Segment design is abandoned, and Node+CAS+synchronized is adopted to ensure thread safety and update lock granularity. Ideally, the number of table array elements (that is, array length) is the maximum number of concurrent support.

Encounter a Hash collision

  1. In java7, Hash conflicts are encountered, using the zipper method, and the search time complexity is O(n), where n is the length of the list.

  2. When java8 encounters Hash conflicts, the zipper method is used first, and the search time complexity is O(n). When the list length exceeds a certain threshold,

The linked list is transformed into a red-black tree structure to ensure the search efficiency, and the search time complexity is reduced to O(log(n)), where N is the number of nodes in the tree.

Author: Wulin Meng main wechat public number: programmer internal work heart method