Implementation principle of HashMap -1

A HashMap structure

  • The data structure of JDK1.8 HashMap is array + linked list + red-black tree

A variable is introduced

  • DEFAULT_INITIAL_CAPACITY Initialization length of the Table array
// The default initial capacity - MUST be a power of two.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
Copy the code
  • MAXIMUM_CAPACITY Maximum length of the Table array
// The maximum capacity, used if a higher value is implicitly specified by either 
// of the constructors with arguments. MUST be a power of two <= 1<<30.
static final int MAXIMUM_CAPACITY = 1 << 30;
Copy the code
  • DEFAULT_LOAD_FACTOR Load factor: The default value is 0.75. When the total number of elements > (current array length * load factor). The array is expanded, it’s doubled in size
// The load factor used when none specified in constructor.
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
Copy the code
  • TREEIFY_THRESHOLD Threshold for treeization of linked lists: The default value is 8. Indicates that when the number of values under a node (Table) is greater than 8, the linked list will be converted to a red-black tree
//The bin count threshold for using a tree rather than list for a bin. 
//Bins are converted to trees when adding an element to a bin with at least
//this many nodes. The value must be greater than 2 and should be at 
//least 8 to mesh with assumptions in tree removal about conversion back 
//to plain bins upon shrinkage.
static final int TREEIFY_THRESHOLD = 8;
Copy the code
  • UNTREEIFY_THRESHOLD Red-black tree chain threshold: The default value is 6. Indicates that during capacity expansion, if the number of red-black tree nodes under a single Node is less than 6, the red-black tree is converted to a linked list
//The bin count threshold for untreeifying a (split) bin during a resize
//operation. Should be less than TREEIFY_THRESHOLD, and at most 6 
//to mesh with shrinkage detection under removal.
static final int UNTREEIFY_THRESHOLD = 6;
Copy the code
  • MIN_TREEIFY_CAPACITY = 64 Min treification threshold. The Table will be treed only when all elements in the Table exceed the changed value (to prevent frequent capacity expansion and tree conflict in the early stage).
//The smallest table capacity for which bins may be treeified. 
//(Otherwise the table is resized if too many nodes in a bin.)
//Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts between 
//resizing and treeification thresholds.
static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

Hashmap.put (K,V) process and analysis

HashMap in put(K,V) flow as follows:

First of all, array query is efficient and can be directly searched by subscript. Linked lists are used to solve hash conflicts. According to the above flow chart, when putting elements, the hash value of the key is obtained through the hash algorithm. Then hash(key)&(n-1) can be used to calculate the position of this key in Node[], so it is possible for two different keys to fall in the same subscript position, storing this kind of structure with the same subscript is a linked list. The structure of the linked list is as follows:

// The Node class holds the element structure. JDK1.7 uses the Entry class
static class Node<K.V> implements Map.Entry<K.V> {
    / / hash value
    final int hash;
    final K key;
    V value;
    // Next element pointer
    Node<K,V> next;
    // constructor
    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey(a)        { return key; }   // Returns the key corresponding to this item
    public final V getValue(a)      { return value; } // Returns the value corresponding to this item
    public final String toString(a) { return key + "=" + value; }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    /** * hashCode () */
    public final int hashCode(a) {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    /** * equals () * Checks whether two entries are equal, and returns true only if the key and value are equal
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false; }}Copy the code

Analysis of 1: hash (key)

The source code is as follows:

// Convert key to hashCode = use hashCode() + 1 bit operation + 1 xor operation (2 perturbations)
H = key.hashcode ()
// 2. The high part participates in the low part: h ^ (h >>> 16)
static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    // a. If key = null, the hash value is 0, so the key of a HashMap can be null
    // b. If key ≠ null, calculate the hashCode() of key (denoted by h).
    // Then perturb the hash code: bitwise xor (^) the binary of the hash code itself moved 16 bits to the right
}
Copy the code
  • Calculation diagram

HashCode is an int. In a 32-bit operating system, hashCode takes up 4 bytes, that is, 32 bits. Therefore, a 16-bit xor operation is performed to increase the randomness of the hash value, making the subscripts more uniform and making full use of space in subsequent calculation

Analysis 2: Calculates the storage location -(length-1) & hash

Depending on the hash position, you can use the remainder of the hash % length, but a hashMap uses & to achieve the same result and to compute faster. For example:

The array length is set to the NTH power of 2, and the binary value is: 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 1111 the array subscript is guaranteed to be less than or equal to this value, and every value less than this value is guaranteed to be obtained. And through the hash value operation processed with 👆 above, it can reduce collisions, try to distribute data evenly, and each linked list length is roughly the same.

Analysis 3: putVal(Hash (key), key, value, false, true)

// 
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    // n is the array length, and I is the array index obtained by the hash operation &
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 1. If the hash table array TAB is empty, resize() is used to create the hash table
    // So, the time to initialize the hash table = resize() on the first call to put
    // The resize () source code will be analyzed in detail in the following section, which is skipped here
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 2. If the array I of the hash table is empty, create a new node in that array
    // p is the pointer to TAB [I]
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // Otherwise, Hash conflicts exist, that is, nodes exist in the current storage location
    else {
        Node<K,V> e; K k;
        //a. Check whether the current key is the same as the key to be inserted
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        // Determine whether the data structure to be inserted is a red-black tree
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // Otherwise, the list structure is linked
        else {
            // Add a new node to the list using tail interpolation
            // Why not use header interpolation? I think it is because JDK1.8 adds red-black tree data structure,
            // If the number of linked list structures reaches the tree threshold,
            // Since we must iterate through the list, we should use the tail interpolation method.
            for (int binCount = 0; ; ++binCount) {
                // If the current position is the end of the list, e is the Node at the next position
                if ((e = p.next) == null) {
                    // If it is a tail node, create a new node and place it after the p node to complete the tail insertion
                    p.next = newNode(hash, key, value, null);
                    // Determine whether the tree threshold is reached, and then tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // If there is a key in the list that is the same as the key to insert, return
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break;
                // Reset p to the judgment Node of the loopp = e; }}// A Node object with the same key already exists in the hash table. Replace the Value of Node
        if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            / / use LinkedHashMap
            afterNodeAccess(e);
            returnoldValue; }}// Number of changes + 1
    ++modCount;
    // If size is larger than the threshold, expand the capacity
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

Analysis of 4: resize ()

    /** * array expansion *@returnThe expanded array */
    final Node<K,V>[] resize() {
        // The old array
        Node<K,V>[] oldTab = table;
        // The old array length
        int oldCap = (oldTab == null)?0 : oldTab.length;
        // Old threshold
        int oldThr = threshold;
        // New array length new threshold
        int newCap, newThr = 0;
        // If the old array is not empty
        if (oldCap > 0) {
            // If the old array is already larger than the maximum capacity, the old array will not be reached
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // Otherwise double the size of the old array and determine if the old array is larger than 16
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                    oldCap >= DEFAULT_INITIAL_CAPACITY) {
                // Double the new threshold: 16*0.75=12 32*0.75=24
                newThr = oldThr << 1; }}else if (oldThr > 0) {
            // Set the array length to the threshold
            newCap = oldThr;
        } else {
            // If no capacity expansion condition is set, use the default capacity expansion condition and capacity
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // If the new threshold is 0, the old threshold is >0, but the old array length is 0
        // Reset the threshold
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        Node<K,V>[] newTab = new Node[newCap];
        / / the new array
        table = newTab;
        if(oldTab ! =null) {
            // If the old array is not empty, traverse the nodes in the old array and move to the new array
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    // e.next is empty, indicating that only one element under the subscript of the current array is not a linked list or red-black tree
                    if (e.next == null) {
                        // e.hash & (newCap - 1)
                        // The hash value of e is used to obtain the corresponding subscript position
                        // Put the new array at the corresponding index
                        newTab[e.hash & (newCap - 1)] = e;
                    } else if (e instanceof TreeNode) {
                        // If it is a red-black tree, execute the following code
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    } else {
                        // High order list
                        Node<K,V> loHead = null, loTail = null;
                        // Low level linked list
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // The original Entry through hashCode & (newCap-1) will only fall under two subscripts
                            // No1
                            // No2 << 1
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null) {
                                    loHead = e;
                                } else {
                                    loTail.next = e;
                                }
                                loTail = e;
                            }
                            else {
                                if (hiTail == null) {
                                    hiHead = e;
                                } else{ hiTail.next = e; } hiTail = e; }}while((e = next) ! =null);
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

During capacity expansion, high and low linked list operation details:

Analysis 5: Tail insertion + high and low list solve JDK1.7 expansion ring list problem

JDK1.7 expansion source:

/** * Analysis 1: resize(2 * table.length) * Function: When the capacity is insufficient (capacity > threshold), expand the capacity (to double) */ 
   void resize(int newCapacity) {  
    
    // 1. Save the old table
    Entry[] oldTable = table;  

    // 2. Save the old capacity, i.e. the length of the array
    int oldCapacity = oldTable.length; 

    // 3. If the old capacity is already the default maximum capacity, set the threshold to the maximum value of an integer and exit
    if (oldCapacity == MAXIMUM_CAPACITY) {  
        threshold = Integer.MAX_VALUE;  
        return;  
    }  
  
    // 4. Create an array based on the new capacity (twice the capacity), that is, a new table
    Entry[] newTable = new Entry[newCapacity];  

    // 5. Move the old array data (key-value pairs) to the new table to complete the expansion ->> Analysis 1.1
    transfer(newTable); 

    // 6. The new array table is referenced to the HashMap's table property
    table = newTable;  

    // 7. Reset the threshold
    threshold = (int)(newCapacity * loadFactor); 
} 

 Transfer (newTable); * Move data (key-value pairs) from the old array to the new table, thus complete the expansion * process: traverses the linked list in the order of the old list, and inserts */ in the head of the new list 
void transfer(Entry[] newTable) {
      // 1.src references the old array
      Entry[] src = table; 

      // 2. Get new array size = get new capacity size
      int newCapacity = newTable.length;

      // 3. Transfer the data (key-value pairs) from the old array to the new array by iterating through the old array
      for (int j = 0; j < src.length; j++) { 
          // get each element of the old array
          Entry<K,V> e = src[j];           
          if(e ! =null) {
              (for loop, the old array no longer references any objects)
              src[j] = null; 
              do { 
                  // 3.3 Iterates through the list starting with the array element
                  // Note: When transferring a linked list, save 1 node because it is a single linked list, otherwise the linked list will break after transferring
                  Entry<K,V> next = e.next; 
                 // 3.4 Recalculate where each element is stored
                 int i = indexFor(e.hash, newCapacity); 
                 // 3.5 Place elements on array: use the single linked list header insertion method = Store data on the linked list header = Place the original data in the array position on the last 1 pointer, and place the data to be placed in the array position
                 // After capacity expansion, there may be reverse order: traversal the linked list in the positive order of the old list, and insert the new list in the head
                 e.next = newTable[i]; 
                 newTable[i] = e;  
                 // 3.6 Access the next Entry chain, and so on until all the nodes on the list have been traversed
                 e = next;             
             } while(e ! =null);
             // This loop continues until all the data elements in the array are iterated}}}Copy the code

In JDK1.7, linked lists are created using header interpolation and not high and low linked lists. In high concurrency cases, it is possible for two threads to operate on the same list with the same array subscript in the same HashMap. Between two threads, by CPU scheduling rotation execution, easy to form a circular linked list, as shown in the following figure: