HashMap is a common data interface used in Java development to store key:value structures. Meanwhile, HashMap is the basis of three data structures: HashSet, HashTable, and ConcurrentHashMap.

In this article, we will analyze the source code of HashMap in detail. Later, we will introduce the source code of HashSet, HashTable, and ConcurrentHashMap and compare them with HashMap.

1 Basic Structure

A HashMap is a very common data structure used to store key-value data. It is also the simplest. Here we use Java 1.8 as an example.

The structure of a HashMap looks like this:

1.1 Dynamic Array

Let’s look at the top layer first.

From the structure diagram, we can see that a HashMap is a dynamic array that can be expanded. For expansion purposes, the dynamic array has the following properties:

  • Capacity: indicates the current array length. For efficient expansion, the value is always of the form 2^n. After each expansion, n increases by 1, that is, the entire array becomes twice as large. The initial default value is 16.
  • LoadFactor: loadFactor. The default value is 0.75. This value is used in conjunction with threshold.
  • Threshold: capacity expansion threshold, equal to capacity x loadFactor. When there are so many elements in the array, it triggers an array expansion.

1.2 Array Elements

Each element in the array is a Node, which has the following attributes:

  • Hash: Hash value of the current position value
  • Key: indicates the key of the current position
  • Value: indicates the value stored at the current location
  • next; The next Node

1.3 List or tree

For each Node element, we find that it has a next attribute. Using it, multiple nodes mounted to the same location in the array form a list or tree.

In Java1.7, there are no trees, meaning that multiple nodes mounted to the same location in the array form a one-way linked list using the next attribute.

In Java1.8, the itemized list becomes a tree when the element in a single-necklace list is 8 or greater. The tree is a red-black tree. This conversion is implemented by the final void treeifyBin(Node<K,V>[] TAB, int hash) function.

Because the search time for linked lists is O(N), and the search time for red-black trees is O(logN). As a result, the design of Java1.8 reduces time complexity when more nodes are mounted at the same location in the array.

2 Initialization operations

A HashMap initialization operation is very simple, is to determine initialCapacity, loadFactor the initial value of the process.

Normally, when we call the no-argument constructor public HashMap(), all the values take their default values. Namely loadFactor = 0.75.

If we introduced into initialCapacity loadFactor, can call the following method.

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
Copy the code

It is important to note, however, that when we initialize, we only initialize the parameters that create the array. We do not actually create the dynamic array. The creation of a truly dynamic array is triggered when the data is first written.

This is essentially a lazy loading operation, preventing memory waste after initialization.

3 Data writing operations

The process of writing data to a HashMap can be summarized as follows:

  • Calculates the Hash value of the data to insert and determines the insertion position of the element (that is, in the dynamic array) based on that value.
  • Puts the element at the specified location in the array
    • If there is no element before the array location, it is placed directly
      • If the number of array elements exceeds the capacity expansion threshold, the array is expanded
      • If the array element does not exceed the expansion threshold, the write is complete
    • If the array location is preceded by an element, mount it to the back end of the existing element
      • If the previous element formed a tree, the tree is mounted at the specified location
      • If the previous elements form a linked list
        • If the length of the list to which the element is added exceeds 8, the list is converted to a red-black tree before insertion
        • If the length of the linked list does not exceed 8, it is directly inserted

The actual operation is more complex than these below, we directly combined with the source code for analysis. Related key steps I added a note.

Public V put(K key, V value) {return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // If the dynamic array is null, the array is initialized. So here for the first time into the value will initialize array if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // Hash to find that the array position of the element to be put is null, If ((p = TAB [I = (n-1) &hash]) == null) TAB [I] = newNode(hash, key, value, null); Node<K,V> e; K k; / / determine whether the original position is the first element and the new element key exactly the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; Else if (p instanceof TreeNode) // Add new node e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); For (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); If (binCount >= treeify_threshold-1) // -1 for 1st // If the list is too long, change it to red-black treeifyBin(TAB, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; } // If (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

4 Data reading operations

Data reads are simpler than data writes. The overall process is summarized as follows:

  • Hash out the specified position in the array based on the value of the key to be retrieved
  • Fetches the element at the specified position (in this case, the hash value of the key is the same)
    • If the key is exactly the same, that value is returned, and the search ends.
    • If the key is different, determine whether it is followed by a tree or a list
      • If it is a tree, follow the tree method
      • If it is a list, look it up as a list

If no result is found in any step, the key does not exist in the map. You can return null.

The code and associated comments are as follows:

Public V get(Object key) {Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; Return null if ((TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {if (first hash = = hash && / / always check first node key / / the first node and to find the same ((k = first. Key) = = key | | (key! = null && key.equals(k)))) return first; if ((e = first.next) ! If (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreenode (hash,  key); Do {/ / in accordance with the list method to find the the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

5 Capacity Expansion

Before we talk about expanding arrays, let’s start with a question: Why do arrays have to be 2^n?

We know that expansion is a two-step process:

  • One is to double the size of the array.
  • The second is to recalculate the hash value of all elements that have been hash distributed into the array and allocate them to the new array.

The first step requires only an additional storage area, while the second step requires a huge amount of computing resources. If 50,000 elements exist before expansion, recalculate the hash value of the 50,000 elements and move the hash value based on the new result. This operation is called a rehash operation and is an expensive one.

Therefore, it is very important to be able to improve the performance of the rehash.

The array size must be 2 to the n to improve the performance of the heavy hash. It’s a math problem. Let’s do it step by step.

5.1 Hashing and rehashing

We use hash every time we read or write, but what about the hash function used in a HashMap?

It is:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

Finally, we get an int number. And that number ends up being mapped to some location I in the array.

If the array length is N, then I can be calculated as:

tab[i = (n - 1) & hash]
Copy the code

This calculation appears multiple times in the HashMap source code.

(n-1) is the maximum position in the array, and hash is the result of the hash, both of which perform logic and operations. If we look at the binary, the logic and the operation can be understood as the intersection of the hash binary on the (n-1) binary, which we call P. This is going to be less than or equal to n minus 1.

So when you expand, n becomes twice as big, so let’s call that m. So in binary, m is the same thing as adding a 1 to the binary of n. So the result of the intersection of the binary number of hash and the binary number of M is denoted as Q. Then, except for the highest bit, the result p of the intersection of q and the hash binary on the binary of (n-1) is consistent. So q is either p or p plus n.

Let the length of the table be a, then b, and b=2a. The elements in the original table[I] are rehashed into newTable[I] and newTable[I +a]. So, this reduces the amount of computation and movement.

5.2 capacity

With that out of the way, we can view the expansion operations directly from the HashMap source code.

But because the relevant source code is quite long, I made relevant cuts.

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap > 0) {if (oldCap >= MAXIMUM_CAPACITY) {// If the length is too long, no extension is performed. Threshold = integer.max_value; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) NewThr = oldThr << 1; newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; // Create a new array @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! For (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; Else if (e instanceof TreeNode) // Tree element hash ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// preserve order // omitted: list element rehash}}}} return newTab; }Copy the code

6 summarizes

HashMap is a commonly used class in Java. In this paper, the structure and source code of HashMap are combed and analyzed in detail. Including HashMap data structure, data access, data write, capacity expansion and other methods.

In the following articles, we will introduce the source code for HashSet, HashTable, and ConcurrentHashMap and compare them with HashMap.

Finally, I am Yi Ge, senior software architect.

Welcome to follow me, I will occasionally share the knowledge of software architecture and programming.