preface

HashMap is one of the most popular data structures used by Java programmers. With the evolution and development of JDK, JDK 1.8 has also been optimized for the underlying implementation, such as the introduction of red-black tree data structure and expansion optimization. This article will combine JDK 1.7 and 1.8 source code, in-depth discussion of the structure implementation and functional principle of HashMap, please be patient to read it for a long time.

Introduction to the

Like ArrayList, HashMap inherits from List and implements it one after another. The inheritance relationship is almost identical, except that List is replaced by Map.

Java defines an interface java.util.map for mapping in data structures. This interface mainly has four commonly used implementation classes, namely HashMap, HashTable, LinkedHashMap and TreeMap. The inheritance relationship of HashMap class is shown in the following figure:

The basic principle of

Basic data structure

The basic data structure of HashMap is array + linked list/red-black tree. In small scale, it is mainly array + linked list (similar to bucket storage). When a large amount of data is converted to red-black tree, we need to mention when it will be converted to red-black tree, which is a problem we must pay attention to and will be discussed later.

Put the principle of

Before we can do put source analysis we need to understand something called a perturbation function, which is used to compute the stored hash value.

// Perturbation function
static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

From the above functions, we can see several key information:

  • HashMap supports null storage, and returns 0 if the key is null.
  • If you access the main HashMap array directly with the hash value as the subscript, it’s hard to get collisions considering that the binary 32-bit signed int table values range from -2147483648 to 2147483648, but the hash value cannot be used directly because the HashMap starts at 16.
  • How to handle hash values: There is a function called indexFor to handle this before, which does modular arithmetic with lengths. In JDK8 this operation is put into putVal and handled directly.

So having said perturbed functions we’re going to go into the put principle.

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

PutVal, the outermost method of the put function, is the actual method used in HashMap. The source code for HashMap is very complex, and each of the other parts could have written an article explaining how such an elegant data structure was designed.

PutVal source code comment parsing

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
		// If table is empty or 0 resize, assign the length to n, and do the necessary initialization
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null) // Where & is substituted for % (division hash)
				// If the hash value is null, create a new node and insert it at the specified position.
        tab[i] = newNode(hash, key, value, null);
      // The p node is the element in the array calculated from the hash value. If it is empty, it means it has not been inserted yet.
    else {
        Node<K,V> e; K k;
				// If the inserted keys are the same, overwrite them
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
				// If the node is a red-black tree, the red-black tree operation is performed
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
						// List operations
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >=TREEIFY_THRESHOLD- 1)// -1 for 1st
// Convert the list to a red-black tree.
treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break;
								// Update pp = e; }}// If e exists, overwrite it
        if(e ! =null) {// existing mapping for key
V oldValue = e.value;
						// Determine whether overwrite is allowed, and oldValue is the current state
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e); // callback to allow the LinkedHashMap post-operation
            return oldValue;
        }
    }
    ++modCount;   // Change the number of operations
		// If the capacity exceeds the threshold, run resize to expand the capacity
    if (++size > threshold)
        resize();
		// End the operation after Node insertion
    afterNodeInsertion(evict);
		// callback to allow the LinkedHashMap post-operation
    return null;
}
Copy the code

The HashMap putVal operation is divided into the following steps:

1) Initialize and prepare the nodes.

2) Determine the operation to be performed and create and assign values (same key, red-black tree node, linked list operation).

3) Apply the created node to the Map.

4) Capacity detection.

Here we have several concerns:

1) How to store the corresponding node data in the array: calculate the position by (n-1) &hash.

2) When a list is converted to a red-black tree: binCount greater than 7 is converted to a red-black tree, and less than 6 is converted to a linked list.

Get the principle

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code

Use the same hash method for key and PUT to find the corresponding value.

GetNode returns a number of data nodes, or null if null;

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        if (first.hash == hash &&// always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        if((e = first.next) ! =null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
						// List iteration
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}

Copy the code

1) Use hash & (table.length-1) to obtain the hash slot of the data node corresponding to the key.

2) Check whether the first node is null, if so, return NULL.

3) Determine whether the hash value of the first node is the same as the hash value to be searched. If so, directly return and fetch.

4) Enter the cycle of linked list iteration, as shown in the flow chart below.

If Node is a red-black tree, getNode returns the red-black tree Node for the key.

final TreeNode<K,V> find(inth, Object k, Class<? > kc) {
    TreeNode<K,V> p = this;
    do {
        int ph, dir; K pk;
        TreeNode<K,V> pl = p.left, pr = p.right, q;
        if ((ph = p.hash) > h)
            p = pl;
        else if (ph < h)
            p = pr;
        else if((pk = p.key) == k || (k ! =null && k.equals(pk)))
            return p;
        else if (pl == null)
            p = pr;
        else if (pr == null)
            p = pl;
        else if((kc ! =null|| (kc =comparableClassFor(k)) ! =null) && (dir =compareComparables(kc, k, pk)) ! =0)
            p = (dir < 0)? pl : pr;else if((q = pr.find(h, k, kc)) ! =null)
            return q;
        else
            p = pl;
    } while(p ! =null);
    return null;
}
Copy the code

Hash collision

When a hashCode is computed in the same space as a hashCode value, it conflicts with a put value.

In Java HashMap, the “zip method” is used to deal with the collision problem of HashCode, as shown below.

A HashMap uses a linked list to solve collisions, and when a collision occurs, the object is stored in the next node of the list. A hashMap stores key-value pair objects on each linked list node. When two different keys have the same hashCode, they are stored in a linked list at the same bucket location

Red and black tree

R-b Tree, full name is red-black Tree, also known as “Red Black Tree”, it is a special binary search Tree. Red-black trees have memory bits on each node to indicate the color of the node, which can be Red or Black.

Red-black tree features:

(1) Each node is either black or red.

(2) The root node is black.

(3) Each leaf node (NIL) is black. [Note: leaf nodes are only empty leaf nodes!]

(4) If a node is red, its children must be black.

(5) All paths from a node to its descendants contain the same number of black nodes.

Here is a simple red-black tree.

Linked list to red black tree conversion

// List to bidirectional list operation
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
		// If the total number of elements is less than 64, continue to expand, node pointing adjustment
    if (tab == null || (n = tab.length) <MIN_TREEIFY_CAPACITY)
        resize();
	 // Find the head of the list first
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        TreeNode<K,V> hd = null, tl = null; // hd: tree head
        do {
						// Create a red and black root
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        if((tab[index] = hd) ! =null)
           // This is the real transition to a red black tree.hd.treeify(tab); }}Copy the code

This is a red-black tree entry method that you enter when a HashMap needs to be converted to a red-black tree.

1) Resize () is automatically triggered when the number of nodes is empty or less than 64.

2) Recursively build TreeNode, and call Treeify to complete treification after construction.

Expansion mechanism

To understand a data structure, we must know its capacity expansion algorithm. Capacity expansion is achieved through resize in HashMap. Next, we will analyze the capacity expansion mechanism of REsize.

final Node<K,V>[] resize() {
		// Get various information about the old element array
    Node<K,V>[] oldTab = table;
   // Get the old capacity length
    int oldCap = (oldTab == null)?0 : oldTab.length;
   // Critical point of expansion
    int oldThr = threshold;
	 // Define the length of the new array and the threshold for expansion
    int newCap, newThr = 0;
   // If the old table capacity > 0, the old table is not empty.
    if (oldCap > 0) {
        // If the array length reaches the maximum value, change the threshold to integer.max_value
        if (oldCap >=MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Select * from * where *
        else if ((newCap = oldCap << 1) <MAXIMUM_CAPACITY&&
                 oldCap >=DEFAULT_INITIAL_CAPACITY)
            // The threshold is also doubled
            newThr = oldThr << 1;
}
    else if (oldThr > 0)// Set the old threshold multiple to the new capacity
newCap = oldThr;
    else { // If it is new initialization, cap is set to the default capacity
newCap =DEFAULT_INITIAL_CAPACITY;
				// If the threshold is set to the default threshold
        newThr = (int)(DEFAULT_LOAD_FACTOR*DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) { // If the threshold is 0, set the threshold
        float ft = (float)newCap * loadFactor;
        newThr = (newCap <MAXIMUM_CAPACITY&& ft < (float)MAXIMUM_CAPACITY?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr; // Update the fill factor
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
		// After the expansion, we need to adjust the red black tree or linked list.
    if(oldTab ! =null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode) // Red black tree adjustment
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else {
// List adjustment
Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab; // Return a new table data;
}

Copy the code

Above is a complete expansion of the source code, the overall is divided into two parts: expansion of new space, adjust the location of the original data. I’ve made a few comments where necessary, but there are some problems we don’t know from looking at the code, so I’ll address some of them.


When will resize be triggered?

HashMap triggers resize for the following reasons:

Capacity: indicates the length of the current HashMap

LoadFactor: LoadFactor. The default value is 0.75f.

When currentCapacity is greater than currentCapacity * LoadFactor, resize is triggered for capacity expansion.

How to expand capacity?

There are two steps:

  • Expand and create a new array twice as long (this is mentioned in the source code comment).
  • Take the old array, rehash it and put it in the new array.

Why store after reHash instead of directly?

This point needs to be combined with the hash perturbation function we added, (n-1) & hash. When we calculate the position, we get the modulo operation between n and hash. If the n changes and the previous position is still used, the position cannot be found, as shown in the following diagram.

Position before expansion:

Before capacity expansion, there is a HashMap bucket with capacity 3. The data in the bucket is as follows. In this case, you need to expand the HashMap.

If the expansion is directly copied over:

If (6-1) &hash is equal to (3-1) &hash, the size of the HashMap will be 6 instead of 3. The answer is obviously different, so if you do not resize after the resize, all previous nodes will be lost.

The result after rehash:

If you rehash at this point, there is no such problem and you can get the corresponding value directly.

conclusion

There are many knowledge points like HashMap, and one article is too much to finish. Therefore, I did not introduce some basic knowledge of HashMap, and put more space on some key parts of the introduction. As for more mysteries, we need to explore together.

reference

Blog.csdn.net/AJ1101/arti…

www.zhihu.com/question/20…

Blog.csdn.net/cmyperson/a…

www.cnblogs.com/skywang1234…

Yuanrengu.com/2020/ba1842…

Blog.csdn.net/AJ1101/arti…

www.cnblogs.com/aobing/p/12…