above
Before formal source learning begins, we need to have a general understanding of what we are studying. As a data structure often used in daily development, HashMap is also familiar with its basic features, such as key-value pairs in actual storage form, only one same key can exist, and differences from HashTable, etc. But these are relatively application-level things, and understanding the underlying implementation is essential if you really want to get the most out of using HashMap and avoid pitfalls.
This article will provide a brief analysis of the source code for HashMap in the following directory structure. The following JDK version is 1.8.
- The structure of a HashMap
- API
- The inner class
- parameter
- The Hash algorithm
- Put () and get ()
- Difference between 1.7 and 1.8
structure
API
HashMap implements the Map interface, so let’s start by looking at the underlying APIS that the Map interface defines.
The first are the methods found in most collection classes, size(), PUT (), get(), remove(), clear(), isEmpty(), and so on, as well as methods that return a Key or Value set. The rest are methods related to a key-value pair storage structure like Map.
Starting with JDK1.8, HashMap provides some functional programming apis that we can use in certain scenarios to make our business scenarios very straightforward.
The inner class
There are several inner classes in HashMap. The storage data structure of HashMap is based on Node and TreeNode. These two inner classes implement the inner classes of Map interface.
Node The code of a storage Node is as follows:
/** Basic hash bucket node */
static class Node<K.V> implements Map.Entry<K.V> {
final int hash; / / hash value
final K key;
V value;
Node<K,V> next; // Next node of the linked list } Copy the code
The underlying structure of a HashMap should be familiar. In general, it is stored as an array and a linked list. The basic storage unit of the linked list is the Node structure, which stores the key and value and the hash value corresponding to the key. It avoids the repeated calculation of hash value in some method calls, and holds a reference to the next Node to ensure the form of a linked list. The following image shows how a HashMap holds data as an array plus a linked list.
TreeNode is a storage Node used when the linked list is converted into a red-black tree structure when the length of the list is greater than a certain extent. In addition to the four attributes of Node, It also holds references to its parent, left, right, and prev nodes (as well as before and after of LinkedhashMap.Entry). The following image shows how a HashMap holds data as an array, a linked list, and a red-black tree.
Other inner classes are simpler, except for collection classes that act as HashMaps to access their internal key, value, or entry nodes. The rest are related iterators, etc., which we won’t go into in depth in this article.
parameter
Before we look at the loadFactor and size parameters, we know that a HashMap can be instantiated using four public constructors.
The four constructors are specify initialize capacity and specify load factor construction, specify initialize capacity construction, default construction, and give another Map object to construct. As you can see from the code, we can only customize these two parameters when instantiating the HashMap, and use the default values if they are not specified.
Let’s look at what these two values mean.
The size parameter is the number of key-value mappings contained in the map, and its corresponding default value DEFAULT_INITIAL_CAPACITY is 16. It must be a power of 2 because of the hash algorithm of the HashMap. We’ll talk about that when we get to the hash algorithm part.
The loadFactor parameter is the loadFactor of a HashMap. What is a load factor? The load factor decision is the factor by which a HashMap expands its existing elements (load). The default value of DEFAULT_LOAD_FACTOR is 0.75F, indicating that the HashMap expands when the number of existing elements in the HashMap reaches three quarters of its capacity. For example, when the initial capacity is 16, the elements in the HashMap are 12 and the elements are discretely distributed in different positions of the HashMap, the next element insertion may lead to Hash collision. Therefore, expansion is required to ensure the discreteness of the elements of the HashMap.
The capacity expansion threshold obtained from these two parameters becomes the threshold. Once the capacity of the HashMap exceeds this threshold, the capacity expansion mechanism will be triggered to attempt capacity expansion. The maximum capacity that can be expanded is MAXIMUM_CAPACITY = 1 << 30. If the maximum capacity exceeds this threshold, the system will not expand the capacity (the highest bit is a sign bit, so the capacity can only be expanded to this size, but not to 1 << 31).
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
/ /... The put operation
// If the size is greater than the threshold, expand the capacity
if (++size > threshold)
resize(); return null; } Copy the code
After JDK1.8, the form of HashMap array + linked list becomes array + linked list + red-black tree. This mechanism is to solve the problem that Hash table degrades into linked list under the condition of frequent Hash collisions, resulting in the query time complexity increasing to O(N). Therefore, when the length of the linked list is larger than a certain threshold, The linked list is transformed into a red-black tree structure to ensure O(log N) time complexity. The threshold is TREEIFY_THRESHOLD=8, which translates to a red-black tree when the list length is greater than 8. In practice, the node in the array is considered, so the conversion takes place at the size of treeIFy_threshold-1.
Of course, there must also be an anti-tree threshold, UNTREEIFY_THRESHOLD=6. The length threshold of the tree degradation back list is slightly smaller than the mature tree threshold. The lazy loading method is used to avoid frequent structural changes.
There is another important parameter for treification: MIN_TREEIFY_CAPACITY=64. The tree mechanism is enabled only when the capacity is greater than this value. If the capacity of the HashMap is less than this value, even if the condition is reached, capacity expansion is used instead of tree, rather than tree directly. In the case of fewer tree elements, query efficiency is not necessarily better than linked lists.
The remaining property is entrySet, which holds a collection of key-value pairs. The Table property is the Node array shown in the figure above. ModCount, a version-like attribute, is present in many collection classes, enabling fail-fast.
The Hash algorithm
There are two important methods for this part, hash() and tableSizeFor(). The two methods are used to compute hash values and a tableSize.
hash()
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
The hash() method in JDK1.8 is different from the hash() method in JDK1.7. The differences will be discussed later. In a HashMap, the initial length of the hash table is 16. If hashCode is used directly to address the length, then only the lower four bits are valid. So if hashcodes are 2^10, 2^20, 2^30, then the index will be the same and the hash table will not be evenly distributed. Therefore, JDK1.8 uses xOR between high and low values to perform perturbation calculations to reduce such conflicts (HashMap allows null keys to be stored, which returns 0 directly from the hash calculation of null keys).
tableSizeFor()
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8; n |= n >>> 16; return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; } Copy the code
Before explaining this method, let’s now look at how a HashMap is addressed using a hash value. As can be seen from the figure, the array subscript corresponding to the hash value is calculated by (n-1) & hash (n is the array length). So why not just hash % n? In fact, (n-1) &hash evaluates to hash % n (if n is a power of 2, which is why the size of a HashMap must be a power of 2). Using and instead of modulo is faster, so a HashMap needs to ensure that the length of the hash table is a power of 2. And the tableSizeFor() method does just that.
If tableSizeFor() is used, you can obtain the minimum power of 2 that is greater than or equal to cap. For example, tableSizeFor(6)=8 and tableSizeFor(12)=16. By keeping the length of the hash table to a power of two, the (n-1) & hash addressing ensures that the array index corresponding to the hash value is correctly and efficiently obtained.
Put () and get ()
put
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
Copy the code
The Put() method is implemented by calling the putVal() method of the final modifier. Let’s follow the code step by step to see how it is implemented.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// If the hash table is empty, resize
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length; // If (n-1)&hash is empty, a new node is created if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; / / the key conflict if (p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k)))) e = p; / / tree node else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // Query for a replacement in the list, or add a new node at the end else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) break; p = e; } } / / return if(e ! =null) { // existing mapping for key V oldValue = e.value; if(! onlyIfAbsent || oldValue ==null) e.value = value; afterNodeAccess(e); return oldValue; } } // Modify "version number" ++modCount; // If the size is greater than the threshold, expand the capacity if (++size > threshold) resize(); afterNodeInsertion(evict); return null; } Copy the code
resize()
Let’s look at the main resize() method, which is annotated in more detail.
final Node<K,V>[] resize() {
// Save a copy of the current table
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null)?0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0; if (oldCap > 0) { / / capacity // Limit the maximum capacity if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } // Try twice else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } // If an initial capacity is specified during construction, the capacity is stored in threshold and initialized with this value else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; // Otherwise, use the default initial capacity and calculate the corresponding threshold else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { // After the initial capacity is specified, a new threshold is calculated float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; // Add data from the old table to the new table @SuppressWarnings({"rawtypes"."unchecked"}) Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap]; table = newTab; if(oldTab ! =null) { // Iterate over the old table array node for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if((e = oldTab[j]) ! =null) { oldTab[j] = null; // This node is a single node. Hash & (capcity-1) is used to calculate the index of the new table if (e.next == null) newTab[e.hash & (newCap - 1)] = e; // if it is a tree node, use the TreeNode#split() method else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); / /!!!!!! Point!! // This node is a linked list node that reallocates the elements in the list else { // preserve order // Low level linked list Node<K,V> loHead = null, loTail = null; // High order list Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; // Loop through the list do { next = e.next; // hash & oldCap == 0 if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } // Other high else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while((e = next) ! =null); // Place the low list here unchanged if(loTail ! =null) { loTail.next = null; newTab[j] = loHead; } // place the high-order list at j + oldCap if(hiTail ! =null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; } Copy the code
Let’s focus on the “!! Important!!” The code. This is basically rehash from the old hash table to the new hash table.
If the node traversed is a single node, the index is found through the addressing method mentioned above and placed directly in the corresponding position of the new table. If it is a tree node, it is reassigned using the TreeNode#split() method; If it’s a linked list node, it’s more complicated.
As can be seen from the code, the designer of HashMap prepares two linked lists for rehash linked list nodes, namely low and high, corresponding to the two linked lists of low and high index. That is, the current long linked list is split into one or two chains. It can be seen that (e.hash & oldCap) == 0 is used to judge in the code, true will be put into the low level, false will be put into the high level, and after the allocation, the low level list will be put in its original position, and the high level list will be put into the position of “current index j + old table length oldCap”. This piece of code is actually quite different from JDK1.7, and JDK1.8 fixes a serious bug caused by this piece of code, which we’ll save for later.
TreeNode#split()
final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
TreeNode<K,V> b = this;
// Redistribute the tree into two linked lists, which are saved by pre-traversal
// Also split into low and high lists
TreeNode<K,V> loHead = null, loTail = null;
TreeNode<K,V> hiHead = null, hiTail = null; int lc = 0, hc = 0; for(TreeNode<K,V> e = b, next; e ! =null; e = next) { next = (TreeNode<K,V>)e.next; e.next = null; // This judgment condition is the same as linked list splitting if ((e.hash & bit) == 0) { if ((e.prev = loTail) == null) loHead = e; else loTail.next = e; loTail = e; ++lc; } else { if ((e.prev = hiTail) == null) hiHead = e; else hiTail.next = e; hiTail = e; ++hc; } } // Place the index as index if(loHead ! =null) { // Decide whether to restore the tree based on the length of the split list if (lc <= UNTREEIFY_THRESHOLD) // Replace TreeNode with Node tab[index] = loHead.untreeify(map); else { tab[index] = loHead; if(hiHead ! =null) // (else is already treeified) loHead.treeify(tab); } } // place index as index + bit(oldCap) if(hiHead ! =null) { if (hc <= UNTREEIFY_THRESHOLD) tab[index + bit] = hiHead.untreeify(map); else { tab[index + bit] = hiHead; if(loHead ! =null) hiHead.treeify(tab); } } } Copy the code
Reassigning a tree is actually saving the tree structure as a pre-traversal of the list and rehashing it in the same way as list allocation. The logic is not detailed.
get()
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code
The get() method is implemented using the getNode() method. Again, follow the code step by step.
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// The table is not empty, and the position corresponding to the key exists
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
// Check the first node first if (first.hash == hash && // always check first node ((k = first.key) == key || (key ! =null && key.equals(k)))) return first; if((e = first.next) ! =null) { // if the node is a tree, use the TreeNode#getTreeNode() method if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); // Otherwise, the list is iterated for the corresponding key do { if (e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) return e; } while((e = e.next) ! =null); } } return null; } Copy the code
First of all, we still look at the operation mode of linked lists. The operation mode of linked lists is very simple, that is, ordinary linked lists query corresponding keys.
If it is a tree node, it is queried using the TreeNode#find() method.
TreeNode#getTreeNode()
final TreeNode<K,V> getTreeNode(int h, Object k) {
return((parent ! =null)? root() :this).find(h, k, null);
}
Copy the code
final TreeNode<K,V> find(inth, Object k, Class<? > kc) {
TreeNode<K,V> p = this;
do {
int ph, dir; K pk;
TreeNode<K,V> pl = p.left, pr = p.right, q;
// Use hash comparisons to determine which side to traverse if ((ph = p.hash) > h) p = pl; else if (ph < h) p = pr; // Equality returns directly else if((pk = p.key) == k || (k ! =null && k.equals(pk))) return p; // Iterate over the right side first else if (pl == null) p = pr; // Iterate over the left side again else if (pr == null) p = pl; // Compare by class provided else if((kc ! =null || (kc = comparableClassFor(k)) ! =null) && (dir = compareComparables(kc, k, pk)) ! =0) p = (dir < 0)? pl : pr; else if((q = pr.find(h, k, kc)) ! =null) return q; else p = pl; } while(p ! =null); return null; } Copy the code
The bottom layer is the simplest tree traversal query.
Difference between 1.7 and 1.8
JDK1.7 and 1.8 have quite a few different implementations of HashMap.
-
1.7 Use array + list, 1.8 use array + list + red-black tree.
-
1.7 uses head insertions (which led to a serious bug) and 1.8 uses tail insertions.
-
1.7 Rehash makes it possible to change the order of a linked list (as a result of head insertions), while 1.8 rehash guarantees the order of the original list.
-
The hash algorithm in 1.7 is different from that in 1.8.
-
The resize method of 1.7 differs greatly from that of 1.8.
The difference between 1.7 and 1.8 will be discussed in detail in a separate article later (because I haven’t read the 1.7 source code yet).
later
The article references several blogs for a few key points.
https://www.cnblogs.com/eycuii/p/12015283.html
https://mp.weixin.qq.com/s/_zbOHbQa2zDVosXUlYUrSQ
Here are some notes and understandings I made while studying the HashMap source code. If anything is wrong, feel free to comment