Hashmaps have been a very common data structure and a very common collection type that is asked about in interviews, so today we’ll talk about HashMaps.
But why specify hashMaps that are Java8? As we all know, Java8 has a lot of big changes and changes, such as functional programming, and HashMap also has a big change.
Take a look at Map
Common Map types are as follows:
A HashMap:
- A disorderly
- Fast access
- Keys cannot duplicate (only one NULL key is allowed)
LinkedHashMap:
- The orderly
- A HashMap subclass
TreeMap:
- Records stored in TreeMap are sorted by Key (ascending by default), so records obtained by Iterator traversal are sorted
- Because sorting is required, the Key in TreeMap must implement the Comparable interface, otherwise a ClassCastException will be reported
- TreeMap uses its key’s compareTo method to determine if the key is duplicated
In addition to the above, we may have seen a class called Hashtable:
Hashtable:
- A legacy class, thread-safe, similar to HashMap
- When thread safety is not required, choose HashMap instead
- When thread safety is required, use ConcurrentHashMap instead
HashMap
Let’s take a formal look at HashMap
Let’s take a look at some of the main features inside a HashMap:
- Hash tables (hash tables) are used for data storage, and chained addresses are used to resolve conflicts
- When the list length is greater than or equal to 8, the list is converted to a red-black tree for storage
- Perform the quadratic expansion, that is, expand the capacity twice
field
HashMap has the following fields:
- Node[] table: hash table for storing data; Initial length length = 16 (
DEFAULT_INITIAL_CAPACITY
), the capacity is doubled (n * 2) - Final float loadFactor: a loadFactor that determines the relationship between the length of the array and the maximum number of key-value pairs that can currently be stored; It is not recommended to modify it easily, except in special circumstances
- Int threshold: the maximum number of key-value pairs that can be allowed. Threshold = length * Load factor. If the number of existing key/value pairs is greater than this value, the system expands the capacity
- Int modCount: The number of times the HashMap structure has been changed (e.g., every time a new value is put, it increments by 1)
- Int size: indicates the number of current key-values
It is worth noting that the initial size of the array in the HashMap is 16. Why is that? I’ll talk about that later when I talk about the put method.
methods
hash(Object key)
We all know that the Object class hashCode method is closely related to HashMap because HashMap uses hashCode to determine where a key is stored in an array. (Everyone should know the relationship and convention between hashCode and equals methods, but I won’t go into that here.)
Java 8 has improved on what was done before and optimized the algorithm
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
It’s worth noting that a HashMap does not use hashCode directly as a hash value, but rather performs a series of shifts and xor operations on hashCode through the hash methods here, in order to effectively avoid hash collisions
In this way, the hash value of the key remains the same 16 bits high, and the lower 16 bits are different from the higher 16 bits as the final hash value of the key. As we’ll see later, a HashMap uses (n-1) &hash to determine the position of elements (where n is the current array size)
Obviously, this calculation method determines that the position of the element is only related to the low value, which will increase the possibility of hash collision. Therefore, we use xOR processing of the high and low value of hash value to reduce the possibility of conflict, so that the position of the element is not only determined by the low value
put(K key, V value)
The put method is one of the core methods in HashMap. It is related to the storage of data in HashMap.
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
Copy the code
The put method calls the putVal method directly. Here I have added a comment for you, so you can feel it step by step with the following flowchart:
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
HashMap.Node<K, V>[] tab;
HashMap.Node<K, V> p;
int n, i;
if((TAB = table) = = null | | (n = TAB. Length) = = 0) / / initializes the hash table n = (TAB = the resize ()). The length;if ((p = tab[i = (n - 1) & hash]) == null) // Insert TAB [I] = newNode(hash, key, value, null);
else{ HashMap.Node<K, V> e; K k; // If the key of the element at that position is the same as that of the element at that position, the value is reassigned directly after the elementif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p;else ifE = ((hashmap.treenode <K, V>) p).puttreeval (this, TAB,hash, key, value);
else{// Otherwise iterate through the list step by stepfor (int binCount = 0; ; ++binCount) {
if((e = p.ext) == null) {// Insert elements at the end of the chain p.ext = newNode(hash, key, value, null);
if(binCount >= treeify_threshold-1) // If the number of elements is greater than or equal to 8, change the tree to a red-black treeifyBin(tab, hash);
break; } // If the key of the element at that position is equal, the value is reassignedif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }} // If the current key exists in the hash table, e is assignedif(e ! = null) { V oldValue = e.value;if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);returnoldValue; } } ++modCount; // Check whether the threshold is exceeded. If yes, expand the capacityif (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
The main logical steps are here:
One interesting thing to note is that before Java 8, HashMap inserts data into the linked list header; With Java 8, it changed to tail insertion. As for the disadvantages of head inserts, one of them is that in concurrent cases, the expansion due to inserts may appear linked list loops and cause an infinite loop; Of course, hashMaps are not inherently designed for concurrent situations.
(1) Why is the initial size of HashMap 16
Whenever an element is inserted, we need to compute the position of the value in the array, p = TAB [I = (n-1) & hash].
When n = 16, n-1 = 15 and the binary is 1111, the position of the element in the hash operation depends entirely on the size of the hash
If not 16, such as n = 10, n-1 = 9, binary 1001, then the operation and, it is easy to appear repeated values, such as 1101&1001, 1011&1001, 1111&1001, the results are the same. So you can imagine why you chose 16 and multiplied by two every time you expanded
(2) Lazy loading
In the constructor of the HashMap, we can see that the hash table Node[] table is not initialized at the beginning; If you look at the PUT method, you can see:
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
Copy the code
If the hash table is empty or has a length of 0, the resize method will be used to initialize the table. This method obviously applies the lazy-load principle, which is used only when the hash table is used for the first time
(3) Treification
In Java8, the biggest change to HashMap is the addition of tree processing. When the list has 8 or more elements, it is possible to transform the list into a red-black tree data structure.
final void treeifyBin(HashMap.Node<K,V>[] tab, int hash) {
int n, index; HashMap.Node<K,V> e;
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {
//......
}
Copy the code
The way in which we can observe trees treated treeifyBin, found that when the TAB = = null | | (n = TAB. Length) < MIN_TREEIFY_CAPACITY is true, will only expansion process, instead of on the tree; MIN_TREEIFY_CAPACITY specifies the minimum table size that a HashMap can be treed to be 64. This is because when the hash table size is small at the beginning, the probability of hash collisions is high and the probability of long lists is slightly higher. We should prioritize capacity expansion to avoid this kind of unnecessary treification.
So why is a HashMap tree-like? As we all know, the query efficiency of linked list is much lower than that of array, and when too many elements are joined into linked list, the query access performance will be greatly reduced. At the same time, there is a security issue, some code can use data that can cause hash conflicts to attack the system, which can cause a large amount of server CPU usage.
resize()
The capacity expansion method is also a core method in HashMap, and it is also a performance intensive operation.
We all know that arrays cannot automatically expand, so we need to recalculate the new size, create a new array, copy all the elements into the new array, and free the old array.
In addition, Java8 specifies that the HashMap is expanded by twice each time (n*2). Because of this, the new index position of each element in the array can be either unchanged or the original position + the expanded length (i.e., the offset value is the expanded length). In contrast, prior to Java8, each expansion required recalculation of the index position of each value in the array, increasing the performance cost
The position of each element in the hash table array is equal to (n-1) &hash, where n is the size of the array and hash is the hash value computed by the hash method
In the figure, we can see that before the expansion, the position of the array calculated by the hash values 0001 0101 and 0000 0101 is 0000 0101, which is 5, and the array size is 0000 1111 + 1, which is 16
After the expansion, the array is expanded from 16 to 32 (0001 1111). In this case, the two hash values are 0001 0101 and 0000 0101 respectively, 21 and 5. The difference between the two hash values is 16, which is the expanded array size
This is actually pretty easy to understand. When we double the array size, n-1 becomes 2n-1, which is the highest bit in the original binary
Therefore, after the operation of &, the result can only be two cases, one is no influence, the other is the original position + expansion length
So how does the source code determine which of these two cases it is? As we said earlier, the size of the array in a HashMap is always a multiple of 16, so hashing & n and hashing & (2n-1) are equal in their values
So the source code uses a very simple method (oldCap is the size of the original array, that is, n)
if ((e.hash & oldCap) == 0) {
...
} else{... }Copy the code
When E. hash & oldCap is equal to 0, the position of the element remains unchanged; when e.hash & oldCap is not 0, the position is original position + expanded length
get(Object key)
Now that you know the storage mechanism of a HashMap, the GET method is easy to understand
final HashMap.Node<K,V> getNode(int hash, Object key) {
HashMap.Node<K,V>[] tab; HashMap.Node<K,V> first, e; int n; K k;
if((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash])! = null) {// Checks the first element in the current position, and returns it if it happens to be that elementif (first.hash == hash&& ((k = first.key) == key || (key ! = null && key.equals(k))))return first;
if((e = first.next) ! = null) {// Otherwise check whether it is a tree node and call getTreeNode to get the tree nodeif (first instanceof HashMap.TreeNode)
return ((HashMap.TreeNode<K,V>)first).getTreeNode(hash, key); // Walk through the list to find the target elementdo {
if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
} while ((e = e.next) != null);
}
}
return null;
}
Copy the code
There are four main steps:
- Whether the hash table is empty or whether there are elements at the target location
- Is the first element
- If it is a tree node, find the target tree node
- If it is a linked node, traverse the list to find the target node