preface
HashMap is a must-have question for a Java backend engineer interview because it contains so much knowledge that it is a good way to test a candidate’s Java background. Today we will look at the underlying implementation of HashMap based on JDK1.8.
The internal data structure of the HashMap
JDK1.8 is an array + list + red-black tree
The underlying implementation of HashMap in JDK8 is different from that in JDK7:
- new HashMap(); We didn’t create an array of length 16 at the bottom
- The underlying array in JDK 8 is Node[] instead of Entry[].
- The first time the put() method is called, the bottom layer creates an array of length 16
- The underlying structure of JDK7 is only array + linked list. Jdk8: array + linked list + red black tree
- When forming a linked list, it’s up and down (jdk7: new element points to old element, header. Jdk8: old element points to new element, tail)
- When the number of linked list elements in an index position of the array is greater than 8 and the length of the current array is greater than 64, all data in the index position is stored in a red-black tree.
Data structure diagram
Several important fields in hashMap (JDK1.8)
// The default initial capacity is 16 0000 0001 right shift 4 bits 0001 0000 is 16, Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; static final int MAXIMUM_CAPACITY = 1 << 30; // Default loading factor is 0.75 static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; // If the length of a red-black tree is less than 6 after the hash table is expanded, it is degraded to a linked list. Static final int MIN_TREEIFY_CAPACITY = 64; static final int MIN_TREEIFY_CAPACITY = 64; // Threshold = main array size * load factor int threshold;Copy the code
Constructor of hashMap:
Public HashMap(int initialCapacity, float loadFactor) {// the initialCapacity is less than 0, If (initialCapacity < 0) throw new IllegalArgumentException("Illegal Initial Capacity: "+ initialCapacity); If (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; // Load factor must be greater than 0, And is a legitimate digital if (loadFactor < = 0 | | Float. The isNaN (loadFactor)) throw new IllegalArgumentException (" Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; This. threshold = tableSizeFor(initialCapacity); } //tableSizeFor If A is greater than 0 and less than the maximum capacity, return A if A is A power of 2; otherwise, convert A to A power of 2 with the smallest difference. Static final int tableSizeFor(int cap) {int n = cap-1; static final int tableSizeFor(int cap) {int n = cap-1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; } public HashMap(int initialCapacity) {this(initialCapacity, DEFAULT_LOAD_FACTOR);} public HashMap(int initialCapacity) {this(initialCapacity, DEFAULT_LOAD_FACTOR); } // Default constructor, load factor 0.75, initial capacity DEFAULT_INITIAL_CAPACITY=16, Public HashMap() {this.loadfactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted} // Pass a MAP collection constructor public HashMap(MAP <? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); }Copy the code
The data insertion principle of HashMap
- Check whether the array is empty and initialize it. After instantiation, a one-dimensional array Entry[] table with length 16 is created at the bottom.
- The array is not empty. First, the key1 hash is computed by calling hashCode() of the class that contains the key. The key1 hash is computed by (n-1) &hash to get the location in the Entry array.
- Check whether there is data in table[index]. If there is no data, construct a Node and store it in table[index]
- 4.1 Compare the hash values of key1 and existing data. If the hash value of key1 is different from the hash value of existing data, key1-Value1 is added successfully. Call equals(key2) to compare equals(key2). If equals() returns false: Key1-value1 is added successfully. ② If equals() returns true: replace value2 with value1
- Determine whether the first node is a red-black TreeNode. 5.1 if the first node is a red-black TreeNode (TreeNode), add a key-value pair to the red-black tree. 5.2 If the first node is a linked list, traverse to find elements, overwrite if there is, create a new one, and add key-value pairs to the linked list. After the addition, it will determine whether the list length is greater than 8 and the array length is greater than 64. If the length is greater than 64, the list will be converted to a red-black tree.
After the insertion, check whether the number of nodes is greater than the threshold. If the number is greater than the threshold, expand the number to twice that of the original array.
Put source code analysis
Public V put(K key, V value) {/** * Passes the key to the hash method to calculate its hash value: * ① When assigning an int to an Object type, the int value is automatically enclosed as an Integer. H =key; h=key; h=key; H >>> 16 is unsigned 16 bits to the right, the low position is crowded out, the high position is filled with 0; When the value passed in is less than 2 ^ 16 -1, the value returned by calling this method is its own value. */ return putVal(hash(key), key, value, false, true); } //onlyIfAbsent is true, do not change the existing value. Final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {Node<K,V>[] TAB; Node<K,V> p; int n, i; // If the table on the trunk is empty and length is 0, call resize, Adjust the length of the table (the resize method in the following figure) if ((TAB = table) = = null | | (n = TAB. Length) = = 0) / * calls to resize here, is the first time I put an array is initialized. If it is the default constructor, the following statements in resize are executed: newCap = DEFAULT_INITIAL_CAPACITY; NewThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); threshold = newThr; Threshold = 16*0.75 Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; Int oldThr = threshold; return newTab if it is a custom constructor. newCap = oldThr; Float ft = (float)newCap * loadFactor; float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); threshold = newThr; New threshold = (int)(new capacity * load factor) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; return newTab; */ n = (tab = resize()).length; If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, hash) value, null); Else {// Node<K,V> e; K k; If (p.h ash = = hash && / / if the location of the old node and the key of the new node is exactly the same ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; E =p else if (p instanceof TreeNode) E = ((TreeNode<K,V>)p). PutTreeVal (this, TAB, hash, key, value); For (int binCount = 0; ; ++binCount) {// an infinite loop if ((e = p.ext) == null) {//e= p.ext, if p next points to null p.ext = newNode(hash, key, value, null); If (binCount >= treeify_threshold-1) // If the list length is greater than or equal to 8 treeifyBin(TAB, hash); // Turn the list into a red-black tree break; } the if (e.h ash = = hash && / / if the traversal process elements in the list and the newly added elements are exactly the same, then jump out of the loop ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; // Assign next in p to p, that is, the next node in the list to p, // continue iterating through the list elements}} if (e! = null) {// If the added element produces a hash conflict, then the //put method returns the value of the last element in the list as oldValue = e.value; if (! OnlyIfAbsent | | oldValue = = null) / / judgment conditions, shall replace / / for newvalue oldValue, return oldValue; Oldvalue e.value = value; afterNodeAccess(e); // This method says return oldValue; } } ++modCount; If (++size > threshold) // If the number of elements exceeds the threshold, expand resize(); // afterNodeInsertion(evict); return null; }Copy the code
How does a single element hash into a new array? How does a linked list hash into a new array? How does a red-black tree hash into a new array?
Final Node<K,V>[] resize() {Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) { return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) The critical value is 2 times newThr = oldThr << 1; Else if (oldThr > 0) newCap = oldThr; Else {// newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; // Add the new threshold to threshold@suppressWarnings ({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; // Add the new threshold to threshold@suppressWarnings ({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; // If (oldTab! // array for (int j = 0; j < oldCap; ++j) {// Array Node<K,V> e; if ((e = oldTab[j]) ! = null) {// Check whether node is empty, save the node at position j to e, and then set oldTab to null. // oldTab becomes an empty array for garbage collection. OldTab [j] = null; NewTab [e.hash & (newcap-1)] = e; if (e.ext == null) newTab[e.hash & (newcap-1)] = e; // The address of the element before the expansion is (oldcap-1) &e.hash, so there are only two possible new addresses: +oldCap else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; /* If this is true, the element's address will not change in the new array. Because the highest bit of oldCap is 1, the corresponding bit of E.hash is 0, so the address obtained after expansion is the same, the location will not change, in the later code execution will be put into the loHead, and finally assigned to newTab[j]; If the judgment is not true, then the address of the element will be changed to the original subscript position +oldCap, that is, the highest bit of lodCap is 1, and the corresponding position of E. dash is also 1. Therefore, the address of the element will be changed after expansion, which will be put into hiHead in the following code. / / newTab[j + oldCap] / / newTab[j + oldCap] / / newTab[j + oldCap] / / newTab[j + oldCap] / / newTab[j + oldCap] / / Hash =26 Binary: 0101 1010 Position of E1 before expansion: E1. hash & oldcap-1 Result: 0000 1010 position of E2 before expansion: e2.hash & oldcap-1 Result: 0000 1010 position of E2 before expansion: e2.hash & oldcap-1 Result: 0000 1010 has the same result, so E1 and E2 are on the same linked list before expansion, this is the state before expansion. After capacity expansion, the location of the element needs to be recalculated. The address calculation method in the linked list before capacity expansion is e.hash & oldcap-1. After capacity expansion, oldCap*2=32 0010 0000 newCap=32. The new calculation method should be E1.hash & newCap-1, that is, 0000 1010 & 0001 1111. The result is 0000 1010, which is the same as the position before expansion. E2. hash & newCap-1:0101 1010&0001 1111 The result is 0001 1010, which is the pre-expansion position +oldCap. But instead of e.hash & Newcap-1, e.hash & oldCap, in fact, these two are equivalent, both of which determine whether the fifth digit from the bottom is 0 or 1. If it is 0, the position remains unchanged; if it is 1, the position changes to the position before expansion +oldCap. If (e.hash & oldCap) == 0, execute: E points to the same node object as oldTab[j], so loHead points to the same node object as e and loTail points to the same node object. Finally, execute the second time in the judgment condition e to next, which refers to the second element in oldTab: Lotail is not null, so lotail.next points to e. The next of the node object lotail points to points to E, or the next of the loHead points to e. The node to which loHead points becomes a linked list of length 2. Then lotail=e refers to the address of the second element in the list. Third execution: Similar to the second execution, the length of the list on loHead increases to 3, and another node is added. LoTail points to the new node...... The execution of hiTail and hiHead is the same as above, which is not explained here. LoHead is used to hold the first element on a new list, and loTail is used to hold the last element until the list is traversed. This is when (E.hash & oldCap) == 0 holds. (e.hash & oldCap) == 0; oldCap == 0; oldCap == 0; NewTab [j] and newTab[j+oldCap] */ do {next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; // Next set to null newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; NewTab [j + oldCap] = hiHead; } } } } } return newTab; }Copy the code
HashMap is still insecure, so ConcurrentHashMap is recommended for multithreaded concurrency.
The last
Thank you for reading here, after reading what do not understand, you can ask me in the comments section, if you think the article is helpful to you, remember to give me a thumbs up, every day we will share Java related technical articles or industry information, welcome to pay attention to and forward the article!