HashMap implementation principle and source detailed analysis

The original link

Ps: This blog is based on Jdk1.8

Learning Points:

  • 1. Know the data structure of HashMap
  • 2. Know the hash algorithm in HashMap
  • 3. Know the code implementation of PUT, remove and GET in HashMap
  • 4. What are the hash conflicts of HashMap? How did you handle it?
  • 5. Know the expansion mechanism of HashMap

1. What is HashMap?

HashMap is implemented as a Map interface based on a hash table in the form of key-value storage. The implementation of HashMap is not synchronous, which means it is not thread-safe. Both key and value can be null, and the mapping in a HashMap is not ordered.

2. Features of HashMap

  • Hash stores unordered
  • Both keys and values can store NULL values, but keys can only store a unique NULL value
  • The data structure before JDk8 is array + list, after JDk8 is array + list + red-black tree
  • If the threshold is greater than 8 and the array length is greater than 64, it becomes a red-black tree

3. Data structure of HashMap

In JDK7 case, array + link, hash conflict, convert to linked list:

< span style = “max-width: 100%; clear: both; min-height: 1pt;

HashMap node information:

static class Node<K.V> implements Map.Entry<K.V> {
        // hashCode
        final int hash;
        final K key;
        V value;
        // The next pointer to the list is not null
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
        // ...
}        
Copy the code

4. Initialize the HashMap

4.1. Member Variables

public class HashMap<K.V> extends AbstractMap<K.V>
    implements Map<K.V>, Cloneable.Serializable {
	/** * Serial number version */
    private static final long serialVersionUID = 362498820763181265L;

   /** * Initializes the capacity, 16=2 to the fourth power */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

   /** * Maximum capacity, 2 to the 30th power */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /** * Default load factor. The default value is 0.75 */
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;

    /** * The list node tree becomes a red-black tree when it exceeds 8
    static final int TREEIFY_THRESHOLD = 8;
    
    /** * if the number of nodes in the red-black tree is less than 6, the list is converted back to */
    static final int UNTREEIFY_THRESHOLD = 6;
    
    /** * the minimum length of the array in the bucket for converting the structure into a red-black tree */
    static final int MIN_TREEIFY_CAPACITY = 64;
    
    // ...
	
	 /** * HashMap stores an array of elements */
	transient Node<K,V>[] table;

    /** * is used to store cache */
    transient Set<Map.Entry<K,V>> entrySet;

    /** * The number of elements in the HashMap */
    transient int size;

    /** * is used to record the number of HashMap changes */
    transient int modCount;

    /** * Used to resize the value of the next capacity (capacity * load factor) */
    int threshold;

    /** * Load factor of the Hash table */
    final float loadFactor;

}    
Copy the code

4.2. Construction method


public HashMap(int initialCapacity, float loadFactor) {
    The initial capacity cannot be less than 0. If the initial capacity is less than 0, IllegalArgumentException is thrown
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
     // If the initial capacity is greater than the maximum capacity, the maximum capacity is used as the initial capacity
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
     // The load factor cannot be less than 0, and if it is a numeric type, isNan:true, it is non-numeric
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    // Assigns the specified load factor to the global variable
    this.loadFactor = loadFactor;
    // threshold = (capacity) * (load factor)
    this.threshold = tableSizeFor(initialCapacity);
}


public HashMap(int initialCapacity) {
     // Initialize the capacity and default load factor
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
public HashMap(a) {
    // The default load factor is 0.75
	this.loadFactor = DEFAULT_LOAD_FACTOR;
}

Copy the code

Then, we know that the default size of HashMap is 16, and where is it assigned? This. threshold = tableSizeFor(initialCapacity);

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

This involves the basic knowledge of the computer, right shift and or operations, the following legend: through a more difficult calculation, n is 16Scroll through the code and find the following constructorpublic HashMap(Map<? extends K, ? extends V> m) This constructor is used to construct a new HashMap that has the same mapping as the specified Map:

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
     putMapEntries(m, false);
 }
Copy the code

Look at the putMapEntries method:

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    // The length of the collection passed in
    int s = m.size();
    if (s > 0) {
     	// Check whether the table has been initialized
        if (table == null) { // pre-size is not initialized
            // The purpose of adding 1.0f is to round up the decimal to ensure maximum capacity and reduce the number of calls to resize
            float ft = ((float)s / loadFactor) + 1.0 F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            // If t is greater than the threshold of HashMap, tableSizeFor
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        else if (s > threshold) // If it is already initialized, expand resize
            resize();
         // Iterate to add all elements of the map to the hashMap
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict); }}}Copy the code

5, Jdk8 HashMap algorithm

5.1 Hash algorithm in HashMap

In the java.util.hashmap #hash method, there is a specific method used to calculate the hash value: what does this method do? Hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

It looks like the code is only two lines long, but it actually contains the idea of a hash algorithm. Here is a brief analysis:

static final int hash(Object key) {
    // The key passed in is null, which returns the default value 0
    if (key == null) return 0;
    // Compute the hash code
    int h = key.hashCode();
    // Shift the calculated hashCode 16 bits to the right, equivalent to multiplying by (1/2) to the 16th power
    int t = h >>> 16;
    // Perform xor on both values and return
    return h ^ t;
}
Copy the code

So what you’re going to do is you’re going to compute the hashCode, and then you’re going to shift the hashCode 16 bits to the right, and then you’re going to do xor to those two numbers. That seems to be the case. Then what is the author’s intention? First of all, since it’s a hash algorithm, the purpose of a hash algorithm is to evenly distribute the data

Can be seen from the figure, the use of exclusive or operation, in the probability of 0 s and 1 s are equal, so that’s why use exclusive or operation, the essence of the hash algorithm to make uniform distribution data, using the hash value of an exclusive or operation because of relatively uniform hash distribution, so a hash conflict probability is small lot

Supplement:

  • And operation: the corresponding digits of two numbers are 1, after the operation is 1, the other cases are 0;
  • Or operation: if either of the corresponding bits of two numbers is 1, or if the operation is followed by 1, otherwise, 0;
  • Xor operation: when two numbers have the same corresponding bits, the result is 1, otherwise 0;

And then why do I move 16 to the right? We know that the largest value of an int is 2 to the 32nd, and then we can divide it into 16 bits higher and 16 bits lower. If we move it 16 bits right, the value becomes smaller

5.2. What is hash conflict in HashMap?

Hash collision is also known as hash collision. In theory, hash collision refers to the same calculated hash value, but in the HashMap, hash collision refers to (n-1)&hash, which is the index of the array in the HashMap. Before Jdk8, we used a linked list. Whenever a hash conflict occurs, we add a node to the end of the list. After jdk8, the method is to use the linked list + red black tree method, the first hash conflict, also use the linked list, then the list reaches 8 nodes, the array length exceeds 64 cases, into red black tree, this can be found in the source code

The inside of the double source, HashMap# putVal, logic, check calculated first, and array subscript TAB, I = (n – 1) & hash whether conflict, new node without conflict, conflict, list or a red-black tree

if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
Copy the code

Put operation of HashMap in Jdk8

  • The core flow of the PUT method
  1. Calculates the index of the array according to HashCode
  2. If the subscript array is empty, a new node is added
  3. Otherwise, it’s a hash conflict, so if the bucket uses a list node, it goes to the end of the list, and if it uses a red-black tree it goes to the red-black tree

The above is the core process. If duplicate keys are ignored, a new value value will be replaced for the key. If size is greater than threshold, capacity expansion will be carried out

Ok, or put source code:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // The initial data is resize
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // Determine whether a hash conflict has occurred
    if ((p = tab[i = (n - 1) & hash]) == null)
        // Hash does not conflict, new node is added
        tab[i] = newNode(hash, key, value, null);
    else { // Hash conflicts are handled using linked lists or red-black trees
        Node<K,V> e; K k;
        // If there are duplicate keys, both key and hash are equal
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            // Assign the old node object to the new e
            e = p;
        else if (p instanceof TreeNode) // Use a red-black tree node
              // Put the node in the red-black tree
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else { // List
             // Infinite loop
            for (int binCount = 0; ; ++binCount) {
                // Go all the way to the last node
                if ((e = p.next) == null) {
                    // Add the new node to the tail
                    p.next = newNode(hash, key, value, null);
                    // If the number of nodes is greater than 8, it becomes a red-black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    // Break the loop
                    break;
                }
                // Also to avoid the hashCode and key situation
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break;
                // reassign for list traversalp = e; }}// If a duplicate key is found, replace the old value with the new value
        if(e ! =null) { // existing mapping for key
            // Record e
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                // Replace the old value with the new value
                e.value = value;
            // Callback after access
            afterNodeAccess(e);
            // Return the old value
            returnoldValue; }}// Record the number of changes
    ++modCount;
    // size is larger than threshole
    if (++size > threshold)
        resize();
     // Callback method
    afterNodeInsertion(evict);
    return null;
}
Copy the code

And then how do you convert to a red black tree? Red black tree knowledge is relatively complex, so more detailed can refer to my previous blog Java handwriting implementation of red black tree:

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    // The MIN_TREEIFY_CAPACITY value is 64, which means that an array size smaller than 64 will not be converted to a red-black tree
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
    	// Capacity expansion method
        resize();
     // Go to red black tree
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        // The head and tail of the red-black tree are hd and T1
        TreeNode<K,V> hd = null, tl = null;
        do {
            // Build the tree node
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                // Assign the new node p to the head node of the red-black tree
                hd = p;
            else {
                // The front node of the new node is the old tail node T1
                p.prev = tl;
                // The next node of t1 is the new node
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        // make the node of the array execute the new TreeNode, which then becomes a TreeNode
        if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code

7. Capacity expansion mechanism of HashMap

This is one of the key points in HashMap, and one of the more difficult questions

7.1 When is capacity expansion required?

When the number of elements in a hashMap exceeds threshold, threshold is the array length multiplied by loadFactor, which is 0.75 F by default

7.2. What is HashMap expansion?

The resize method is a HashMap expansion method, which is time-consuming. When HashMap is expanded, it doubles the capacity of 16 to 32. HashMap (n-1) + hash (n-1) + hash (n-1) + hash (n-1) + hash (n-1) After the expansion, the nodes will either be left in their original locations, which may sound confusing, so take a closer look at the following analysis:

For example, if the capacity is expanded from 16 to 32, the diagram shows:3. To expand or expand twice as much as before:At this point, subscript(n-1) & hash, data after capacity expansion10101And the original00101It’s actually 1bit more than that,10101It’s 21 in decimal, and21 = 5 + 16, it is"Original location + Old capacity"And there’s another case where it stays zero, where it doesn’t change position

A table is given below, and the data is shown in the figure:The capacity is 16There are two low Pointers loHead and lloTail, and two high Pointers hiHead and hiTailAfter expanding to 32, add two more lists to their respective locations. There are two cases, the sum of the original position"Original location + Old capacity"This positionTherefore, in the process of expansion, the corresponding node position change is as follows:

7.3, resize source code implementation

After the above detailed analysis, this implementation logic can be found in the code corresponding, OK, with the corresponding source code:

final Node<K,V>[] resize() {
   // Get the current node array
    Node<K,V>[] oldTab = table;
    // The length of the array
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    // Calculate the size after expansion
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) { // The maximum capacity is 1 <<< 30
             // Set the threshold to the maximum capacity
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // If it is not exceeded, it is doubled
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
         // Assign the old threshold to the new array length
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        // Use the default value 16
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Recalculate the threshold and assign it to threshold
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    // The new threshold, which was 12 by default, is now 24
    threshold = newThr;
    // Create a new node, newCap is the new array length, 32
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                     // is a red-black tree node, call split method
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // Preserve Order is the case with linked lists
                	// Define the relevant pointer
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // No need to move position
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else { // Need to move the position to "old position + old capacity" position
                            if (hiTail == null)
                                hiHead = e;
                            else
                                // hiTail points to the node e to movehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        // The position is unchanged
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        // hiTail points to null
                        hiTail.next = null;
                        // oldCap is old capacity, move to "old + old capacity" position
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

Remove HashMap in Jdk8

Remove method, the idea here is to find the location of the element, if it’s a list, you can just go through the list of remove elements, if it’s a red-black tree, you can go through the red-black tree and find the node, and then remove the node, if the number of nodes in the tree is less than 6, in this case you have to go through the list

@Override
public boolean remove(Object key, Object value) {
    return removeNode(hash(key), key, value, true.true) != null;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    // Array subscripts are (n-1)&hash, where the elements can be found
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) ! =null) {
        Node<K,V> node = null, e; K k; V v;
        // The node on the bucket is the key
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            // Point Node to the Node
            node = p;
        else if((e = p.next) ! =null) { // Lists or red-black tree nodes
            if (p instanceof TreeNode)
                // Find the red-black tree node
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else { // List
                // Walk through the list to find the desired node
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while((e = e.next) ! =null); }}// After the node is found
        if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
            if (node instanceof TreeNode)
            	// Red black tree remove node
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                // list remove by changing the pointer
                tab[index] = node.next;
            else
                p.next = node.next;
            // Record the number of changes
            ++modCount;
            // The number of changes
            --size;
            afterNodeRemoval(node);
            returnnode; }}return null;
}

Copy the code

Get operation for HashMap in Jdk8

Get method: Find value by key. This method is easier to understand

 public V get(Object key) {
   Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
   Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
   // If the hash table is not empty and the bucket corresponding to the key is not empty
   if((tab = table) ! =null && (n = tab.length) > 0 &&
       (first = tab[(n - 1) & hash]) ! =null) {
       // Check the first node against the position of the index
       if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
           return first;
       if((e = first.next) ! =null) { // If it is not the first node, it could be a linked list or a red-black tree node
           if (first instanceof TreeNode)
               // Obtain the red-black tree node according to getTreeNode
               return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // In the case of linked lists, only the list can be traversed
           do {
               if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                   return e;
           } while((e = e.next) ! =null); }}return null;
}

Copy the code

10, HashMap related interview questions

  • What is the data structure of a HashMap?

Before jdK8, it is array + list + red-black tree

  • How is the hash function implemented in HashMap?

Get the hashCode using the JDK’s hashCode() method, move it 16 bits right, and then xor the two numbers

  • What are hash collisions in a HashMap?

Hash collision, also known as hash collision, is usually the same as the calculated hash value. In HashMap, it is based on the calculated hash value, and then the array table subscript (n-1) is the same as the hash value

  • How does HashMap handle hash collisions?

Before jdK8, the linked list method was used. After JDK8, the linked list + red-black tree method was used

  • Is HashMap thread safe?

HashMap is not thread-safe because there are no synchronization locks or other thread-safe operations in the source code

  • HashMap is not thread-safe, and then what are the methods?

ConcurrentHashMap can be used

  • How does ConcurrentHashMap keep threads safe?

ConcurrentHashMap in Jdk8 uses CAS plus synchronized locks to ensure thread safety