As mentioned in the previous article, ThreadLocalMap uses the open address method to resolve conflicts, while HashMap uses the linked list method to handle conflicts. What is linked list method?

In the hash table, each “bucket” or “slot” has a linked list, and all elements with the same hash value are placed in the linked list with the same slot.

Jdk8 differs from JDK7 in that there are no red-black trees and only linked lists are mounted in arrays. In JDK8, if the bucket capacity is greater than or equal to 64 and the list node number is greater than or equal to 8, the tree is converted to a red-black tree. When the number of red-black tree nodes is less than 6, it will be converted to a linked list.

insert

But when inserting, we just need to compute the corresponding slot through the hash function and insert it into the corresponding linked list or red-black tree. If the number of elements exceeds a certain value, expansion is performed and rehash is performed.

Find or delete

Compute the corresponding slot through the hash function, and then traverse the list or delete

Why do linked lists become red black trees?

As mentioned in the previous article, the load factor is used to determine how many free slots there are. If the load factor is exceeded, the HashMap is dynamically expanded to double its original size (the initial capacity is 16, that is, the slot (array) size is 16). However, no matter how reasonably the load factor and hash function are set, it cannot avoid the problem that the linked list is too long. Once the linked list is too long, searching and deleting elements will be time-consuming and affect the performance of HashMap. Therefore, in JDK8, it is optimized to convert the linked list into a red-black tree when the length of the linked list is greater than or equal to 8. The performance of a HashMap can be improved by taking advantage of the characteristics of a red-black tree (the time complexity of lookup, insert, and delete is O(logn) at worst). When the number of nodes is less than six, the red-black tree is converted to a linked list. Because in the case of small amount of data, red black tree to maintain balance, compared with linked lists, performance advantage is not obvious, and coding is much more difficult than linked lists.

Source code analysis

Constructor and important properties

public HashMap(int initialCapacity, float loadFactor);

public HashMap(int initialCapacity);

public HashMap(a);

Copy the code

In the constructor of a HashMap, you can specify the initialization capacity (bucket size) and load factor, respectively. If you do not specify the default values are 16 and 0.75, respectively. It has several important properties as follows:

// Initialize capacity, must be 2 to the NTH power
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// Default load factor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

// The minimum length of a list node to convert from a list to a red-black tree
static final int TREEIFY_THRESHOLD = 8;

// The minimum size of the array when converted to a red-black tree
static final int MIN_TREEIFY_CAPACITY = 64;

// If the number of nodes in a red-black tree is less than 6, the tree is converted to a linked list.
static final int UNTREEIFY_THRESHOLD = 6;

// HashMap threshold, used to determine whether to expand capacity (threshold = capacity *loadFactor)
int threshold;

// Load factor
final float loadFactor;

// List nodes
static class Node<K.V> implements Map.Entry<K.V> {
  final int hash;
  final K key;
  V value;
  Node<K,V> next;

}

// Hold an array of data
transient Node<K,V>[] table;

// Red black tree node
static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
  TreeNode<K,V> parent;  // red-black tree links
  TreeNode<K,V> left;
  TreeNode<K,V> right;
  TreeNode<K,V> prev;    // needed to unlink next upon deletion
  boolean red;
}
Copy the code

The table above is an array of data stores (can be called buckets or slots), and arrays mount linked lists or red-black trees. It is worth noting that the array capacity is not initialized when the HashMap is constructed, but when the element is first put.

Design of hash functions

int hash = (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
int index = hash & (tab.length-1);
Copy the code

Key.hashcode () % tab.length; key.hashcode () % tab.length; A % B = A & (B-1); A % B = A & (B-1); Index = hash & (tab.length-1)

Here is designed using the concept of division remainder method, which can possibly reduce hash conflict division remainder method: Divide the keyword K by some number p that is no longer than the length of the hash table m, and use the remainder as the hash table address. For example, x/8=x>>3, that is, move x 3 bits to the right to obtain the quotient of x/8.

Hash (h = key.hashcode ()) ^ (h >>> 16). So why did we move 16 to the right? Is it not good to just use key.hashcode () & (tab.length-1)? If you do this, the tab.length value will be much smaller than the hash value, so only the low bits will participate in the operation, and the high bits will do nothing, risking hash collisions.

However, hashcode itself is a 32-bit orthopedic value. After shifting 16 bits to the right and then xOR operation, the calculated orthopedic value will have high and low properties, so that a very random hash value can be obtained. Through the division remainder method, the obtained index has a lower probability to reduce conflicts.

Insert data

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                  boolean evict) {

 Node<K,V>[] tab; Node<K,V> p; int n, i;

 // 1. If the array is not initialized, initialize it
 if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;

 // 2. If no data is inserted into the current node (no collision), insert a new node directly
 if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);
 else {
    Node<K,V> e; K k;

    // 3. If the same key exists, overwrite it
   if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
       e = p;
   else if (p instanceof TreeNode)
        // 4. If it is found to be a tree structure after collision, mount it to a tree
       e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
   else {
       for (int binCount = 0; ; ++binCount) {
            // 5. Insert tail, if the number of linked list nodes reaches the upper limit, convert to red black tree
           if ((e = p.next) == null) {
               p.next = newNode(hash, key, value, null);
               if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                   treeifyBin(tab, hash);
               break;
           }
           // 6. There is a collision in the list
           if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
               break; p = e; }}// 7. Replace old value with new value
     if(e ! =null) { // existing mapping for key
       V oldValue = e.value;
       if(! onlyIfAbsent || oldValue ==null)
           e.value = value;
       afterNodeAccess(e);
       return oldValue;
     }
 }
 ++modCount;

 // 8. Expand the operation threshold
 if (++size > threshold)
     resize();

 // implement LinkedHashMap
 afterNodeInsertion(evict);
 return null;
}
Copy the code

Briefly describe the logic of PUT, which consists of the following steps:

  1. If not, initialize the array with an initial capacity of 16
  2. Hash &(n-1) is used to get the array subscript. If this position is empty, the data is not colluded and inserted directly
  3. If a collision occurs and the same key exists, it is overwritten directly in subsequent processing
  4. If it is found to be a tree structure after collision, it will be mounted directly to a red-black tree
  5. After the collision, it is found to be a linked list structure, and tail insertion is carried out. When the linked list capacity is greater than or equal to 8, it is converted to a tree node
  6. If a collision is found in a linked list, direct overwrite is processed later
  7. If the same key exists before, just replace the old value with the new value
  8. If the map capacity (number of storage elements) exceeds the threshold, expand the capacity twice

capacity

The resize() method initializes the array if it is found that the current array is uninitialized. If it is initialized, it will double the size of the array and rehash the data from the old array to the new array.


// The code to double the size of the array is omitted.

if(oldTab ! =null) {
 // Iterate over the old array
 for (int j = 0; j < oldCap; ++j) {
   Node<K,V> e;
   if((e = oldTab[j]) ! =null) {
     oldTab[j] = null;

     // 1. If there is no collision in the old array, move to the new array directly
     if (e.next == null)
        newTab[e.hash & (newCap - 1)] = e;
     else if (e instanceof TreeNode)
        // 2. If there is a collision and the node type is tree node, split the tree node (mount to the expanded array or convert to a linked list)
        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
     else { // preserve order

        // 3. If the conflict is a linked list, the order of the nodes will be preserved

       Node<K,V> loHead = null, loTail = null;
       Node<K,V> hiHead = null, hiTail = null;
       Node<K,V> next;
       do {
         next = e.next;
         // 4. Determine whether the element is in its original position after expansion (this is very smart, will be analyzed below)
         if ((e.hash & oldCap) == 0) {
           if (loTail == null)
               loHead = e;
           else
               loTail.next = e;
           loTail = e;
         }

         // 5. The element is not in its original position
         else {
           if (hiTail == null)
               hiHead = e;
           elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);

       // 6. Copy the elements that have not changed index to the new array
       if(loTail ! =null) {
         loTail.next = null;
         newTab[j] = loHead;
       }

       // 7. Copy the element that changed the index position to the new array
       if(hiTail ! =null) {
         hiTail.next = null;
         // 8. Index = j+oldCap
         newTab[j + oldCap] = hiHead;
       }
     }
   }
 }
}
Copy the code

The entire rehash process is shown in the code above, iterating through the elements of the old array and then doing the following

  1. If there is no data collision in the old array (no linked list or red-black tree is mounted), then the element is assigned directly to the new array, whereindex=e.hash & (newCap - 1).
  2. If there are collisions and the node type is tree, split the tree node (mount it to an expanded array or convert it to a linked list)
  3. If there is a collision and the node is a linked list, the rehash process will preserve the original order of the nodes in the case of the linked list.
  4. To determine whether the element is still in its original position after expansion, pass here(e.hash & oldCap) == 0OldCap indicates the size of the array before expansion.
  5. Update the hiTail and hiHead pointing relationship when the element is not in its original position
  6. Copy the expanded elements that have not changed index into the new array
  7. Copy the element whose index position changed after the expansion into the new array with the subscript isj + oldCap.

Points 4 and 5 divide the linked list elements into two parts (do.. While section), part is the unchanged element of the index after rehash, and part is the element whose index was changed. Two Pointers are used to point to the head and tail nodes.

For example, when oldCap=8,1–>9–>17 is mounted on TAB [1], and after capacity expansion,1–> 17 is mounted on TAB [1], and 9 is mounted on TAB [9].

So how do you determine if the index has changed after rehash? And now what is the index?

The design here is very clever, remember the array size of HashMap is 2 to the NTH power? When we calculate the index position, we use e.hash (tab.length-1).

Here we discuss the process of expanding the array size from 8 to 16.

tab.length -1 = 7   0 0 1 1 1
e.hashCode = x      0 x x x x
==============================
                    0 0 y y y  

Copy the code

You can see that before the expansion, the index position was determined by the lower three digits of hashCode. What about after expansion?

tab.length -1 = 15   0 1 1 1 1
e.hashCode = x       x x x x x
==============================
                     0 z y y y

Copy the code

After expansion, the index position is determined by the lower four digits, and the lower three digits are the same as before expansion. That is, whether the index position changes after the expansion is determined by the high byte, that is, we only need to calculate hashCode and the high byte to determine whether the index has changed.

The high position after expansion is the same as that of oldCap. As shown above, binary 15 is 1111 and binary 8 is 1000, both of which have the same high order. Therefore, we can judge whether index changes by the result of e.hash & oldCap operation.

Similarly, index should be changed after expansion. The value of the new index is also different from that of the oldIndex. The new value is exactly the value of oldIndex + oldCap. So when index changes, the new index is j + oldCap.

At this point, the resize method ends and the element is inserted in its place.

get()

The get() method is relatively simple, and the most important thing is to find where the key is stored

final Node<K,V> getNode(int hash, Object key) {
  Node<K,V>[] tab; Node<K,V> first, e; int n; K k;

  // 1. First (n-1) &hash the element position
  if((tab = table) ! =null && (n = tab.length) > 0 &&
      (first = tab[(n - 1) & hash]) ! =null) {

      // 2. Determine if the first element is the element we are looking for
      if(first.hash == hash && ((k = first.key) == key || (key ! =null && key.equals(k))))
          return first;
      if((e = first.next) ! =null) {
        If the node is a tree, look for elements in the red-black tree
        if (first instanceof TreeNode)
            return ((TreeNode<K,V>)first).getTreeNode(hash, key);
        4. Find the corresponding node in the linked listdo {
            if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                return e;
        } while((e = e.next) ! =null); }}return null;
}

Copy the code

remove

The process of finding nodes in the remove method is the same as that in the get() method. Here we mainly analyze how to deal with finding nodes

if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
    // 1. Delete the tree node. If the deletion is unbalanced, the node will be moved again
    if (node instanceof TreeNode)
        ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
    // The deleted node is the first node in the list, so the second node is directly assigned to the first node
    else if (node == p)
        tab[index] = node.next;
    // The deleted node is the middle node of the list, where p is the prev node of node
    else
        p.next = node.next;
    ++modCount;
    --size;
    afterNodeRemoval(node);
    return node;
}

Copy the code

In the remove method, the most complicated part should be removeTreeNode, because after deleting the red-black tree node, it may need to degenerate into a linked list node, or it may need to move the node location because it does not meet the characteristics of red-black tree. There’s a lot of code, so I won’t post it here. But that’s why you don’t use all red-black trees instead of linked lists.

An infinite loop problem occurred during JDK7 capacity expansion

/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable) {
 Entry[] src = table;
 int newCapacity = newTable.length;
 for (int j = 0; j < src.length; j++) {
   Entry<K,V> e = src[j];
   if(e ! =null) {
       src[j] = null;
       do {
           // The B thread pauses at this point
           Entry<K,V> next = e.next;
           int i = indexFor(e.hash, newCapacity);
           e.next = newTable[i];
           // The element will be placed at the head of the linked list, so the data will be inverted after expansion
           newTable[i] = e;
           e = next;
       } while(e ! =null); }}}Copy the code

The code above is easy to cause an infinite loop during expansion, how is it caused? Suppose there are two threads A and B executing this code, and the array size increases from 2 to 4. Before the expansion, TAB [1]=1–>5–>9.

Thread B continues execution when next = E.next gives up the timeslice and thread A executes the entire code but has not set the internal table to the new newTable.

After thread A completes execution, the elements mounted on TAB [1] are 9–>5–>1. Note that the order is reversed. E = 1, next = 5;

TAB [I] in cycles change order, 1. The TAB [I] = 1, 2. TAB [I] = 5 – > 1, 3) TAB [I] = 9 – – > 5 – > 1

Similarly, thread B is analyzed by the number of cycles

  1. After the first loop execution,newTable[I]=1, e = 5
  2. After the second loop is complete: newTable[I]=5- >1, e = 1.
  3. For the third loop,e has no next, so next points to null. When performing e.n ext = newTable [I] (1 – > 5), was formed 1 — > 5 – > 1 ring, perform newTable [I] = e, the newTable [I] = 1 – > 5 — — > 1.

An infinite loop occurs when a key is found at get in the array, causing 100% CPU problems.

JDK8 doesn’t have this problem, it has an optimization here that uses two Pointers to the head node and one to the tail node, while maintaining the original order of elements. Of course HashMap is still insecure, so ConcurrentHashMap is recommended for multithreaded concurrency.


Your “like” is the biggest support for me, of course, you follow me even better