preface

A Hash table is an efficient data structure with a lookup efficiency of O(1). It is essentially an array that is stored by mapping elements to a location in the array based on the Hash function.

It is possible for different elements to map to the same location, called hash conflicts. There are many methods to solve hash conflict, such as open address method and linked list method.

In addition, in JDK8 and later versions, HashMap has been further optimized by introducing a red-black tree structure to replace the structure of the linked list when appropriate.

This article is based on JDK11.

The principle of analytic

Properties and Frameworks

There are attributes defined in HashMap that have the following meanings:

  • int DEFAULT_INITIAL_CAPACITY = 1 << 4: The default initial capacity is 16
  • int MAXIMUM_CAPACITY = 1 << 30: Maximum capacity
  • Float DEFAULT_LOAD_FACTOR = 0.75 f: Default load factor
  • int TREEIFY_THRESHOLD = 8: When the list length exceeds 8, use red-black tree storage
  • int UNTREEIFY_THRESHOLD = 6: When the number of red-black tree elements is less than 6, go back to the linked list storage
  • int MIN_TREEIFY_CAPACITY = 64: The HashMap is allowed to be converted to red-black tree storage only when its capacity exceeds 64
  • Node<K,V>[] table: Hash table array
  • int size: Indicates the number of key-value pairs
  • int modCount: Indicates the number of structural changes

Structural changes are changes in the number of elements or internal structure, such as rehash operations.

  • int threshold: If the threshold is exceeded, capacity expansion is performed.threshold = capacity * loadFactor

The internal structure

Hash table is an array of Node types. The Node type stored in the linked list is Hashmap. Node, and the Node type stored in the red-black tree is Hashmap. TreeNode, which is a subclass of Node.

Hash functions and index calculations

The hash value in the HashMap is calculated as follows:

// HashMap.class
static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

The xOR operation ensures that if a 32-bit hash value is changed by one bit, the entire return value will change, reducing the number of hash collisions to some extent.


In the HashMap, we also need to calculate the index I of the key, which is computed by its hash value and array capacity N. This is an optimized index calculation, which is equivalent to mod n, as follows:

i = (n-1) & hash
Copy the code

After expansion, there are only two cases of elements in the same index:

  • Keep original index
  • Array length + index

For example, the original length of the array is 16. In binary:

16-1:00 1111 32-1:00 01 1111Copy the code

Assume that hash(key) = 0000 1010 (bit log216LOG_216Log216 is 0). In this case, the indexes are the same before and after expansion. Assume that hash(key) = 0001 1010 (the log216LOG_216LOG216 bit is 1). In this case, the index is 01010 before expansion and 11010 after expansion. The index has been increased by 16.


So how to determine the index of a key after expansion?

We assume that the capacity of the array before expansion is oldCap, and it must be the power of 2, so the binary log2oldCaplog_2oldCaplog2oldCap bit must be 1, so if hash(key) & oldCap result is 0, If the index is the same, add oldCap to the index.

This feature is important and is used in the resize() function to rebuild the expanded array.

Override hashCode

HashMap uses hashCode to calculate the index, and if it is a reference type, uses equals() to determine whether the key is equal. To ensure that objects with the same equals() return the same hashCode(), We want to override hashCode() when overriding equals().

capacity

The resize() operation is performed when the following two situations occur:

  • The hash table is empty, which is equivalent to an initialization operation. The default capacity is 16
  • If the number of key-value pairs exceeds the threshold, the hash table array is doubled

During capacity expansion, the threshold is updated and the index is recalculated to build a new hash table.

In hashing and index calculation, we know that an element of an index can be expanded by 2 times and the corresponding new index can either keep the original index or increase the size of the original array. Take the structure of the linked list as an example, iterate through the list, add the linked list nodes to two different lists according to hash(key) & oldCap, and finally assign the head nodes of the two lists to two positions in the new array.

The nodes with the same index form a linked list with loHead and loTail. Index increases the length of the node to form a linked list, the first and last nodes are hiHead,hiTail respectively.

The source code for this section is as follows:

/** * recalculate the new index of the linked list node at the original index j * e originally was the first node at the index * oldCap old array size * newTab expanded array */
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
    next = e.next;
    if ((e.hash & oldCap) == 0) {
        if (loTail == null) { loHead = e; }
        else { loTail.next = e; }
        loTail = e;
    } else {
        if (hiTail == null) { hiHead = e; }
        else{ hiTail.next = e; } hiTail = e; }}while((e = next) ! =null);
// Keep all nodes of the original index
if(loTail ! =null) {
    loTail.next = null;
    newTab[j] = loHead;
}
// Add all nodes of oldCap
if(hiTail ! =null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead;
}
Copy the code

Initialize the

When initializing using constructors, the capacity and load factors default to 16 and 0.75, which can also be specified by the user. But you don’t create an array object.

The hash table’s array size must be a power of 2. If the specified capacity is not a power of 2, the tableSizeFor(initialCapacity) method is called and replaced with the nearest power of 2, and the threshold is equal to the array capacity (the default constructor does not initialize threshold).

The actual process of creating an array object and updating the threshold is in the resize() method.

A red-black tree

Why red black trees?

The red-black tree RBT is a weakly balanced binary search tree that does not require strict balance as AVL trees do (strict balance requires more rotations). As a result, red-black tree lookup, deletion, and addition are more stable and perform better overall.

Red-black tree is introduced to solve the problem of low query efficiency when the linked list is too long.

Red black tree and linked list transformation

There are two conditions for a linked list to be stored in a red-black tree:

  • Array capacity
    p \ge
    MIN_TREEIFY_CAPACITY
  • The length of the linked list
    p \ge
    TREEIFY_THRESHOLD=8

If the array capacity is insufficient, but the linked list length ≥\ GE ≥ 8, an expansion operation is performed, and the linked list storage is still used.


Why is the threshold 8 by default?

In general, if hashCode is evenly distributed, the length of the linked list is not too long, and red-black tree storage is rarely used. Under ideal conditions, with random hash code, the frequency of nodes appearing in the array follows poisson distribution. When the loading factor is 0.75 and the length of the list is equal to 8, the probability of nodes appearing is only 0.00000006, that is, the length of the list is almost never greater than 8.


Red-black trees can be converted to linked list storage if:

  • Number of nodes in a red-black tree
    Or less \le
    UNTREEIFY_THRESHOLD=6

Deletion of elements or resize operations may result in the reduction of red-black tree nodes and conversion to linked list storage.

Add (PUT) procedure

The put method actually calls putVal(),

/ * * *@paramHash Hash value of the key *@paramOnlyIfAbsent If true, the value of repeated keys * is not changed@returnThe value */ before the new key is added
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict){}
Copy the code

How to add a key/value pair:

  • If the hash table is empty, proceed firstresize()Operation.
  • Evaluates the index of the new elementiAnd recordtable[i]The nodes inp
  • ifpIs empty (indicating that no elements have been stored), and creates one directlyNodeThe new node. Otherwise, the element has been stored and is presentHash conflict, perform the following operations:
    • ifpTreeNodeInstance, callputTreeVal()Method is inserted into the red-black tree
    • ifpNodeIn this example, the traversal list is inserted to the end of the list and determines whether it needs to be converted to a red-black tree based on the length of the list
    • If a duplicate is found (during insertion)key, completes the loop, overwrites the old value with the new value, and returns the old value directly
  • The number of changes increased by 1.modCount++), the number of key-value pairs is increased by 1, if the new capacity of the hash table reaches the threshold, to proceedresize()operation

The putVal() method is a bit too much source code, so I won’t post 😬 here.

The get process

The get(key) method calls getNode() to get the corresponding Node and return value.

// HashMap.class
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code

The procedure for getNode(hash,key) is as follows:

  • Calculating the indexiAnd getiThe first element of the positionfirst. If the hash table is empty orfirstIf null, returnsnull.
  • iffirstIs the element node to be found, and returnsfirst.
  • iffirstTreeNodeType, callgetTreeNode(hash, key)Look it up in the red-black tree structure
  • iffirstNodeType, order to traverse the list to find
  • (No return foundnull)

If the value of the key is equal, return the value of the key directly. If it is not first, determine whether it is a red-black tree structure. If so, call getTreeNode() of the red-black tree to query if it is not a red-black tree structure. The list is iterated from first until the element corresponding to the key is found, or null is returned.

The change of the HashMap

Changes to the storage structure

JDK8 starts using array + list + red-black tree structure instead of array + list.

Changes in the way linked lists are inserted

When JDK8 starts to collide, linked lists use tail inserts instead of head inserts.

Header insertion in a multithreaded environment can cause linked list loops and data loss problems.

Hash function changes

Hash function before JDK8:

final int hash(Object k) {
    int h = hashSeed;
    if (0! = h && kinstanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code

Hash function to start using in JDK8:

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

Changes in index calculations

The indexFor() method is called in JDK8 to calculate an indexFor a key:

// <= JDK7 index calculation
static int indexFor(int h, int length) {
    return h & (length-1);
}
Copy the code

This is essentially a mod to length.

In JDK8, hash(key) and oldCap are used to determine expansion directly.

Traverse elements

Use EntrySet to get a collection of key-value pairs. Entry is an interface used to describe key-value pairs in a Map, and Node and TreeNode are its implementation classes.

(if you use a for loop) you must use an enhanced for loop to get the key-value pairs correctly:

// Iterate over the HashMap with the increment for loop
for(Map.Entry<Integer,String> entry : map.entrySet()){
    System.out.println(entry.getKey());
    System.out.println(entry.getValue());
}
Copy the code

An enhancement loop is just a syntactic sugar that is essentially implemented using iterators, such as the following iterators in a HashMap:

  • HashIteratorIterator parent class, which provides a default implementation and has the following three subclasses
  • EntryIterator: Iterator to traverse Entry, corresponding toEntrySet
  • KeyIterator: iterator to traverse key
  • ValueIterator: iterator over value

Iterator can also be used to iterate over all elements, which is obtained through the entryset.iterator () method:

// use iterators to traverse
Iterator<Map.Entry<Integer, String>> iterator = map.entrySet().iterator();
while (iterator.hasNext()){
    Map.Entry<Integer, String> next = iterator.next();
    System.out.println(next.getKey());
    System.out.println(next.getValue());
}
Copy the code

To remove elements during iterator traversal, use the iterator’s own remove() method instead of the HashMap’s remove() method.

Similarly, KeySet and KeyIterator and Values and ValueIterator can be used to iterate over keys and Values, respectively.