How does HashMap work in JAVA?

Based on the Hash principle.

2 What is a hash?

Hash, in its simplest form, is a way of assigning a unique code to any variable/object after applying any formula/algorithm to its attributes.

A true hash method must follow the following principles:

A hash function should return the same hash code each time it applies a hash function to the same or equal object. In other words, two equal objects must consistently generate the same hash code.

All objects in Java have Hash methods, and all objects in Java inherit the default implementation of the hashCode() function defined in the Object class. This function typically generates different hash codes for all different objects by converting the internal address of the object to an integer.

Idea College: idea.markerhub.com

3 Node class in HashMap

A Map is defined as an object that maps keys to values.

Therefore, there must be some mechanism in the HashMap to store this key-value pair. The answer is yes. HashMap has an inner class, Node, as follows:

static class Node<K,V> implements Map.Entry<K,V> { final int hash; // Record the hash value so that the final K key does not need to be recalculated when rehashing; V value; Node<K,V> next; . // Rest of the code}Copy the code

Of course, the Node class has mappings of keys and values that are stored as properties.

Key is marked final, and there are two fields: Next and Hash.

We will understand the necessity of these attributes in the following sections.

4 How are key-value pairs stored in a HashMap

Key-value pairs are stored in a HashMap as an array of Node internal classes, as follows:

transient Node<K,V>[] table;


Copy the code

After the hash code is calculated, it is converted to the index of the array, which stores the key-value pairs of the hash code. We won’t go into the hash collision in detail here.

The length of the array is always a power of 2, which is done with the following function

static final int tableSizeFor(int cap) { int n = cap - 1; / / if you don't do this operation, such as the incoming cap is the integer power of 2, the return value is twice the expected n | = n > > > 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }Copy the code

The idea is to change all the low binary bits of the passed argument (cap) to 1, and finally add 1 to get the corresponding power of 2 greater than cap as the array length.

Why use a power of two as the capacity of the array?

The hash code calculated by the key (key) may be larger than the capacity of the array.

It can be obtained by simple calculation, but this method is too inefficient. In a HashMap, the following methods are used to ensure that the hash value computed is less than the size of the array.

(n - 1) & hash


Copy the code

This is why we need a power of two as the size of the array. Since n is a power of 2, n-1 resembles a low-order mask.

By using the and operation, all hash values in the high order are zeroed to ensure that the low order is valid, so that the obtained values are all less than n. Also, the next resize() operation will make it easy to recalculate the array subscripts for each Node, as explained later.

Take the default initial value 16 as an example.

01010011 00100101 01010100 00100101&00000000 00000000 00000000 00001111 ---------------------------------- 00000000 00000000 00000000 00000101 // All the high bits are zeros, except the last bits // To ensure that the calculated value is less than the length of the array nCopy the code

However, after using this feature, hash collisions can become correspondingly severe because only low values are taken. And that’s where we need to use the perturbation function.

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}


Copy the code

This function is obtained by xor of the hash code by shifting the upper 16 bits to the right, as shown in the above example

Exclusive or

This method ensures that the high 16 bits remain the same and the low 16 bits change according to xOR results. The index of the evaluated array will change from 5 to 0.

Using the “perturbation function” will reduce the probability of hash collisions. A similar test has been done, and although using this “perturbation function” does not achieve the maximum probability of avoiding hash collisions, it is used in the JDK and only hashes once, considering its computational performance and collision probability.

5 Hash collision and its processing

Ideally, the hash function maps each key to a unique bucket, however, this is not possible. Even well-designed hash functions can cause hash conflicts.

There are many ways to resolve hash conflicts, and wikipedia summarizes four categories

Hash collision solution

In Java HashMap, the first Separate chaining method (mostly translated as zipper method)+ linked lists and red-black trees are used to resolve conflicts.

HashMap result in JDK8

In a HashMap, the hash collision is followed by a member variable Node

next inside the Node class; To form a linked list (nodes less than 8) or a red-black tree (nodes greater than 8 and less than 6 will be converted back to a linked list), thus achieving the purpose of conflict resolution.
,v>

static final int TREEIFY_THRESHOLD = 8;

static final int UNTREEIFY_THRESHOLD = 6;


Copy the code

6 Initialization of the HashMap

public HashMap();
public HashMap(int initialCapacity);
public HashMap(Map<? extends K, ? extends V> m);
public HashMap(int initialCapacity, float loadFactor); 


Copy the code

There are four constructors in a HashMap, most of which initialize capacity and load factors. Take public HashMap(int initialCapacity, float loadFactor) as an example

public HashMap(int initialCapacity, Float loadFactor) {// The initialized capacity cannot be less than 0. If (initialCapacity < 0) throw new IllegalArgumentException("Illegal Initial ") capacity: " + initialCapacity); InitialCapacity = MAXIMUM_CAPACITY; if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; / / load factor not less than 0 if (loadFactor < = 0 | | Float. The isNaN (loadFactor)) throw new IllegalArgumentException (" Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }Copy the code

The capacity and load factor are initialized by this function, and the default load factor and capacity values are used for other constructors called (default load factor = 0.75, default capacity = 16).

At this point, the initialization of the storage container table has not been done and is deferred until first use. HashMap Interview 21 Ask! The interview recommendation you read. Follow the public account of Java technology stack reply interview for more information about the interview.

7 Initialize or dynamically expand the hash table in the HashMap

The hash table refers to the following table variable of type inner Node.

transient Node<K,V>[] table;


Copy the code

As an array, its length needs to be specified when initialized. In actual use, the amount of storage may be larger than this length. Therefore, a threshold parameter is defined in the HashMap. When the storage capacity reaches the specified threshold, the storage capacity needs to be expanded.

I personally consider initialization to be a type of dynamic expansion, except that it expands capacity from 0 to a value in the constructor (16 by default). And you don’t need to rehash elements.

7.1 Conditions for Capacity Expansion

Initialization is done as long as the value is empty or the array length is zero. Expansion is triggered when the number of elements exceeds the threshold.

threshold = loadFactor * capacity


Copy the code

For example, loadFactor=0.75 and capacity=16 are the default values in HashMap

Threshold = loadFactor * capacity = 0.75 * 16 = 12Copy the code

When the number of elements is greater than 12, it expands. Capacity and threshold will also change after capacity expansion.

The load factor affects the triggering threshold. Therefore, when its value is small, there are few hash collisions in the HashMap. In this case, the access performance is high, and the corresponding disadvantage is that more memory is needed. When its value is large, there will be many hash collisions in the HashMap. In this case, the access performance is relatively low, and the corresponding advantage is that less memory is needed. You are not advised to change the default value. If you want to change the default value, you are advised to test it.

7.2 Again, integer powers of 2 and array index calculation

We said that the capacity of the array is a power of 2, and the index of the array is computed by the following code

index = (table.length - 1) & hash


Copy the code

In addition to quickly calculating the index of the array, this method can also skillfully calculate the new hash value when rehash after expansion. Since the array is twice as large as it is now, there will be one more significant bit of n-1, and the extra bit will be in the same place as the original bit. The sample

Before and after expansion

The new index can be computed quickly

Step 7.3

  1. Determine whether to initialize or expand the capacity. NewCap and newThr will be different when they are calculated

  2. Calculate the critical value of the capacity after expansion.

  3. Change the threshold of hashMap to the threshold after capacity expansion

  4. Create a new array based on the expanded capacity, and then reference the hashMap’s table to the new array.

  5. Copy the elements of the old array into the table. In this process, several cases are involved and need to be handled separately (there is only one element, general linked list, red black tree)

Look at the code

Final Node<K, V>[] resize() {// Create oldTab array to save the array before expansion table Node<K, V>[] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; Int oldThr = threshold; int newCap, newThr = 0; If (oldCap > 0) {// If (oldCap >= MAXIMUM_CAPACITY) {// If (oldCap >= MAXIMUM_CAPACITY) {// The capacity threshold is increased to infinity = Integer.MAX_VALUE; Return oldTab; DEFAULT_INITIAL_CAPACITY} else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && OldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; } else if (oldThr > 0) // If old capacity <= 0 and old threshold > 0 // newCap = oldThr; Else {// If the old capacity <= 0 and the old threshold <= 0, the new capacity is expanded to the default initial capacity, The new threshold is DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY newCap = DEFAULT_INITIAL_CAPACITY; // Set the initial capacity of the new array to the default value newThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); If (newThr == 0) {if (newThr == 0) {if (newThr == 0) {if (newThr == 0) {if (newThr == 0) {if (newThr == 0) { The following will determine whether this threshold is valid. If it is, it is the true threshold float ft = (float) newCap * loadFactor; // When the new capacity is < MAXIMUM_CAPACITY and ft < (float)MAXIMUM_CAPACITY, the new critical value is ft, Otherwise, Integer.MAX_VALUE newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY? (int) ft : Integer.MAX_VALUE); } // Set the threshold of hashMap after expansion to newThr threshold = newThr; // Create a new table, NewCap@suppressWarnings ({"rawtypes", "unchecked"}) Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap]; // Change the hashMap table to newTab table = newTab; // If the old table is not empty, copy the elements from the old table into the new table. For (int j = 0; int j = 0; j < oldCap; ++j) { Node<K, V> e; If ((e = oldTab[j])! OldTab [j] = null; NewTab [e.hash & (newcap-1)] = e; // If (e.hash == null) // Place e (oldTab[j] in newTab where e.hash (newcap-1) = e;  Else if (e instanceof TreeNode) // Split the nodes in the tree ((TreeNode<K, V>) e).split(this, newTab, j, oldCap); LoHead = null, loTail = null; loHead <K, V> loHead = null; Node<K, V> hiHead = null, hiTail = null; Node<K, V> next; // iterate through the list of nodes do {next = e.next; If ((e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; Else {oldCap if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // Put the index in the bucket if (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // add oldCap to bucket if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

7.4 Precautions

While HashMap design is excellent, resize() should be avoided as little as possible, as the process can be time-consuming.

Also, because HashMap does not automatically shrink the capacity. So, if you have a hashMap with a lot of capacity, it doesn’t decrease when you do a lot of remove operations. If you feel you need to reduce capacity, create a hashMap again.

8 How does the hashmap.put () function work internally?

After using HashMap several times, you can tell roughly how it adds elements: calculate the hash value of each key, calculate its position in the hash table after some calculation, put the key-value pair into that position, and hash the collision if any.

And here’s how it works (I saved it a long time ago and forgot where it came from)

The source code is as follows:

/* @param hash Specifies the hash value of the key parameter * @param key specifies the key parameter * @param value Specifies the parameter value * @param onlyIfAbsent If the value is true, Even if the specified parameter key already exists in the map, value * @param evict if false, array table in creation mode * @return Returns the old value if value is replaced, otherwise null. Of course, maybe the value corresponding to key is null. */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K, V>[] tab; Node<K, V> p; int n, i; // If the hash table is empty, call resize() to create a hash table, With variable n record length of hash table if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; /** * If the specified hash parameter has no bucket in the table, that is, there is no collision * hash function, (n-1) &hash computes the slot where the key will be placed * (n-1) &hash essentially hash % n, TAB [I] = newNode(hash, key, value, null); if (p = TAB [I = (n-1) & hash]) == null) Else {Node<K, V> e; K k; / / is the first element in the bucket (nodes) in the array of the hash values are equal, the key is equal to the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k))))) // Assign the first element to e, using e to record e = p; // There is no such key pair in the bucket, and the bucket is a red-black tree structure. Else if (p instanceof TreeNode) e = ((TreeNode<K, V>) p).puttreeval (this, TAB, hash, key, value); Else {for (int binCount = 0; ; If ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); If (binCount >= treeify_threshthreshold -1) // -1 for 1st treeifyBin(TAB, hash); if (binCount >= treeify_threshthreshold -1); break; } / / of the linked list node < key, value > < key, value > with the put operation phase at the same time, do not repeat operation, jump out of the loop if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; } // Create a new pair of keys and hashCode equal to the inserted element. = null) {// existing mapping for key // record e value V oldValue = e.value; /** * onlyIfAbsent Is false or null; otherwise, you do not need to replace */ if (! onlyIfAbsent || oldValue == null) e.value = value; AfterNodeAccess (e); // return oldValue; }} // update structured change information ++modCount; // If the number of key-value pairs exceeds the threshold, rehash if (++size > threshold) resize(); // afterNodeInsertion insertion (evict); return null; }Copy the code

In this process, will involve the resolution of hash collision.

9 How does the hashmap.get () method work internally?

/** * returns the value of the specified key mapping, or if the value is null, null * get can be divided into three steps: * 1. Use the hash(Object key) method to calculate the hash of the key. * 2. Use getNode(int hash, Object key) to obtain node. * 3. If node is null, return null, otherwise return Node.value. * * @see #put(Object, Object) */ public V get(Object key) { Node<K, V> e; Return (e = getNode(hash(key), key)) == null? null : e.value; }Copy the code

This ends with a call to the getNode function. The logic is as follows

GetNode working logic

The source code is as follows:

/** * @param hash specifies the hash of key * @param key specifies the key * @return Returns node, Return null */ final Node<K, V> getNode(int hash, Object key) {Node<K, V>[] TAB; Node<K, V> first, e; int n; K k; If ((TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {/ / if the first node with the specified parameters of the barrel on the hash and key matching the if (first. The hash = = hash && / / always check first node ((k = first. Key) = = key | |  (key ! = null && key.equals(k)))) // return first; // If ((e = first.next)! = null) {// If the current bucket is a red-black tree, If (first instanceof TreeNode) return ((TreeNode<K, V>) first).getTreenode (hash, key); / / if the current barrel did not use a red and black tree, the node structure in the bucket to the chain structure do {/ / traverse the list until the key matches the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } // If the hash table is empty, or no node is found, return null return null; }Copy the code

Author: A Jin’s writing desk

Reference: www.cnblogs.com/homejim/