Basic knowledge of
Map
In a HashMap, there is an inherited interface Map
, which is essentially a Map, retrieving values by keys. The official Java comment describes it this way:
An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.
This basically means: objects that map keys to values. Mappings cannot contain duplicate keys; Each key can map to at most one value.
Hash table/hash table
A Hash table (also called a Hash table) is a data structure that is directly accessed based on Key values. That is, it accesses records by mapping key values to a location in the table to speed up lookups. This mapping function is called a hash function, and the array of records is called a hash table.
A HashMap structure
The HashMap structure is actually composed of an array + a linked list + a red-black tree.
HashMap<String, Integer> map = new HashMap<String, Integer>(64, 1);
map.put("aa", 1);
map.put("bb", 2);
map.put("cc", 3);
map.put("dd", 4);
map.put("ee", 5);
map.put("ff", 6);
map.put("gg", 7);
map.put("hh", 8);
map.put("ii", 9);
map.put("jj", 10);
map.put("kk", 10);
map.put("ll", 10);
map.put("mm", 10);
map.put("nn", 10);
map.put("oo", 10);
map.put("pp", 10);
map.put("qq", 10);
Copy the code
The following memory structure can be obtained:
Red and black tree:
List:
HashMap data structure diagram:
The important parameters
loadFactor
LoadFactor is the loadFactor of a hash table, which affects the expansion of a HashMap. The larger the loadFactor is, the more intense the HashMap collision is, but the less resize is. The smaller the load factor, the smaller the HashMap collision probability, but the larger the resize.
threshold
Threshold is the threshold of the HashMap to be expanded, as described in the HashMap source code:
The next size value at which to resize (capacity * load factor)
This represents that threshold is the capacity * load factor of the entire HashMap. When the capacity reaches this threshold, the HashMap expands.
Important method
Put method
The most commonly used method in a HashMap is the PUT method. The main function of the PUT method is to insert a key-value into a HashMap.
- Can be inserted repeatedly;
- Both the inserted KEY and Value can be Null
The HashMap put method looks like this:
- First, the hashCode of the Key is calculated to obtain the corresponding hash value.
- Check whether the slot in the HashMap is empty. If so, initialize the number of slots in the HashMap.
- Check whether the slot corresponding to the Key is occupied. If not, generate a new node using the key-value and place it in the corresponding slot
- If the slot is occupied, divide it into three steps:
4.1. Check whether the keys are the same. If they are, the original Node is returned. 4.2. Determine whether the node belongs to a TreeNode. If so, add it to the red-black tree. 4.3. If it is not a tree Node, it must be a linked list Node, traversing the linked list. If the length of the list is greater than or equal to 7, the list will be converted to a red-black tree, otherwise, it will be added to the list. 5. Check whether the size of the HashMap is greater than the threshold. If yes, expand the HashMap
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; If ((TAB = table) = = null | | (n = TAB. Length) = = 0) / / if the current HashMap storing data in the table is empty, or the size of the table is zero, Table n = (TAB = resize()).length; If (p = TAB [I = (n-1) & hash]) == null) // If (p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); Else {// The table is already occupied by another Node. Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) // check whether the key of the occupied Node and the current key are the same, if so, fetch the Node e = p; Else if (p instanceof TreeNode) // check whether this Node is a TreeNode, if so, E = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, Key, Value); For (int binCount = 0; ; ++binCount) {// Loop through the list if ((e = p.ext) == null) {// Check whether the list is at the end, if it is at the end, Create a newNode p.ext = newNode(hash, key, value, null); If (binCount >= treeify_threshold-1) // -1 for 1st // Check whether the length of the current list is greater than or equal to the tree construction threshold (7). Convert the list to a red-black treeifyBin(TAB, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) // If a Node in the linked list has the same key as the current one, break break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! OnlyIfAbsent | | oldValue = = null) / / if not create mode or the original value is null, the current value directly to the store to the value of the Node above e.v alue = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) // If the size of the current table is greater than the capacity expansion threshold of the HashTable, expand the capacity resize(). afterNodeInsertion(evict); return null; }Copy the code
The resize method
The entire resize module is mainly divided into two large modules. The first module is capacity expansion: calculate the size of the new table and the new threshold. The second module is migration: recalculate the slot positions of nodes in the original table and migrate them.
The main idea of the first large expansion is as follows:
- Determine whether the size of the current table is greater than 0, if it is greater than 0, it will determine whether the size of the current table exceeds the maximum value, if it is greater than the maximum value, it will not expand, if not, it will expand the original size twice the original. And determine whether the size after expansion is greater than the maximum value. If so, expand the capacity according to the maximum value.
- If the size of the current table is 0 and the threshold of the current table is greater than 0, the system sets the original threshold to the size of the new table.
- If neither condition is met, the default size (16) and the default threshold (16 * 0.75) are set for the new table.
- If the threshold of the new table is empty, the new threshold is recalculated
In fact, it can be seen that the main idea of capacity expansion is that, as long as the defined maximum is not exceeded, each capacity expansion is based on twice the size of the code as follows:
Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap >= MAXIMUM_CAPACITY) {// If (oldCap >= MAXIMUM_CAPACITY) {// If (oldCap >= MAXIMUM_CAPACITY) { Threshold = integer. MAX_VALUE; return oldTab; Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) // If tablesize is smaller than the maximum capacity and the size of the table is larger than the default initial capacity, // The threshold of the new table is expanded to twice the original value newThr = oldThr << 1. // double threshold} else if (oldThr > 0) // Initial capacity was placed in threshold NewCap = oldThr; newCap = oldThr; newCap = oldThr; // Zero initial threshold using defaults // So I get the default size newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr;Copy the code
The second main idea of migration is:
First of all, slot positions in a HashMap are determined by the Hash value of the Key and the size-1 value of the tableBitwise and((size-1) & hash)
The second thing to learn is that the capacity of a HashMap is always a power of two
Therefore, there is a very interesting phenomenon: during expansion, the position of the element is either in the original position, or it is moved by 2 powers of the original position, as shown in the figure:Therefore, when expanding and migrating elements, it is not necessary to recalculate the hash value, but to determine whether the new high bit is 0 or 1. If it’s 0, you don’t need to move, if it’s 1, you move toThe original index + oldCapThe slot position
@SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) { for (int j = 0; j < oldCap; ++j) {// loop over table Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; NewTab [e.hash & (newcap-1)] = e; newTab[e.hash & (newcap-1)] = e; newTab[e.hash & (newcap-1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// preserve order // Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; Do {// loop through the list next = e.next; If ((e.hash & oldCap) == 0) {// If (loTail == null) loHead = e; else loTail.next = e; loTail = e; If (hiTail == null) hiHead = e; if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! Lotail.next = null; lotail.next = null; newTab[j] = loHead; } if (hiTail ! = null) {// Put the list to be moved into slot (old index + oldCap). newTab[j + oldCap] = hiHead; } } } } }Copy the code
The get method
The general idea of the get method is:
- Check whether the table in the HashMap is empty and whether the slot corresponding to the key is empty. If so, return NULL.
- If not, check whether the first node in the slot is the desired key
- If the first node is not a tree, check whether the node is a tree. If it is a tree, check whether the node is a tree
- If it’s not a tree, loop through the list
The code is as follows:
final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; If ((TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {/ / figure out whether the slot of the first node corresponding to the key if (first. The hash = = hash && / / always check first node ((k = first. Key) = = key | | (key ! = null && key.equals(k)))) return first; if ((e = first.next) ! If (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreenode (hash, key); Do {/ / if not tree node, then list traversing the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code
Comparison with JDK1.7
- The most obvious thing about JDK1.7 and JDK1.8 is that in JDK1.7, HashMap inserts are header inserts, while in JDK1.8, HashMap inserts are tail inserts. The reason for this change is that Header lists can be reversed during the HashMap resize() process due to multiple threads, causing the list to loop in an infinite loop.
- In JDK1.7 HashMap, the data structure of HashMap is array + one-way linked list. In JDK1.8 HashMap, the data structure of HashMap is array + single linked list + red-black tree.
- In resize(), the difference between JDK1.7 and JDK1.8 is that the new index position is computed at migration time. In JDK1.7, the Key hash is recalculated and the new index position is computed with (size-1) &hash. In JDK1.8, the new index position is computed with (size-1) &hash. If the higher bit value is 0, the index position will remain unchanged; if it is 1, the original HashMap size plus the original index position (original index +oldCap) will be used to reduce the overhead caused by rehash.
A summary of some details about HashMap:
1. When does a linked list become a red-black tree in a HashMap?
You can see that in put
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
Copy the code
If the size of the HashMap is less than 64, the value of the HashMap will be determined. If the size of the HashMap is less than 64, the value of the HashMap will be determined. The HashMap will be expanded directly, rather than first. The HashMap will convert the list into a red-black tree.
Replaces all linked nodes in bin at index for given hash unless table is too small, in which case resizes instead.
All list nodes will be replaced, unless the current table is too small, which will be resized
2. Why is the capacity in HashMap a power of 2?
First point: In the process of PUT, if the modulus of length is directly taken, the operation consumption is relatively high, so the operation of & (length -1) is chosen. However, if the capacity of HashMap is not the power of 2, the data originally needed to be dispersed will be unevenly dispersed, which will exacerbate the collision of HashMap
Second point: In the process of HashMap resize, since the length is a power of 2, there is no need to recalculate the subscript value of the corresponding slot of the node when migrating the node. If the length is not a power of 2, the subscript value of the corresponding node needs to be recalculated every time, which increases the consumption of performance
3. What are the features of HashMap?
First, key-value pairs. All data in a HashMap is stored in key-value pairs. In addition, a HashMap shows that both keys and values can be null. Second: scatter. In a HashMap, all data is stored scattered, regardless of insertion order, and nodes may move over time.
The resources
Working principle and implementation of Java HashMap