Logical process
structure
A HashMap is a list array, that is, an array with a list inside. It can be simply understood as:
// List class Node {Object key; Object value; Node next; } // array class HashMap {Node[] table; }Copy the code
When the number of elements in a HashMap exceeds 8, the linked list evolves into a red-black tree, which can be roughly understood as a balanced binary tree, where the left node is smaller than the parent node and the right node is larger than the parent node, so the efficiency of lookup is the same as binary lookup, which is O(logn).
Put (key, value)
- 1 Obtain hashcode based on the key. If the key is null, hashcode is 0; otherwise, hashCode is: its own HashCode xor its own hashcode unsigned 16 bits to the right
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
-
Hash %size = 1; hash%size = 1; hash%size = 1; hash%size = 1;
-
If hashCode has the same element as hashCode, then check whether the key is the same. If the key is the same, then check whether the key is the same. Replace the value of the node element with the value passed in, otherwise insert it into the head of the list:
// Get hashCode int hashCode = hash(key); Int index = hashCode &(size-1); Node Node = table[index]; while(node! =null) {// If (hashcode! = hash(node)) continue; If (key.equals(node.key)) {Object old = node.value; Node. value = value; // return old; } node = node.next; } // Insert the same key into the table header Node Node = new Node(key,value); node.next = header.next; header.next = node;Copy the code
In plain English: 1 computes hash 2 computes subscripts 3 compares and inserts
Get (key) operation
- 1 Obtain hashCode based on the key
- 2 Compute subscripts according to HashCode, as above
- 3 Get node by subscript and compare. Compare hashCode first and then key, same as above. Return value if all elements are identical, or NULL otherwise
Expand the resize() operation
A HashMap has a loadFactor variable, called the loadFactor, and when the data reaches size * loadFactor, the expansion mechanism is triggered, i.e. the resize() function is called.
- 1 multiplies the original array length by 2(HashMap ensures that the capacity is always 2 to the NTH power) to get the new array length, and creates a new array
- 2 Transfer the data from the original array to the new array. This transfer is different from ArrayList, because the array length has changed, so the subscripts may have changed, so you need to iterate and calculate the subscripts
Key point proof
- 1 why does HashMap set size to the NTH power of 2
Because if any number p, if p is 2 to the NTH power, then for any integer A, a % p = a & (p-1); We know that the array subscript of HashMap is: hashcode % size; If size is 2 to the NTH power, it can be changed to hashcode & (size-1), which is very efficient.
- When p is a power of 2, a%p=a&(p-1)
Hashcode &(size-1) We adopt the classification discussion idea as follows:
Since p is the NTH power of 2, all p is 0 except the highest bit of 1. P-1 is 1 except the highest bit of 0. Conclusion 1: Any A <p, a&(p-1)=a conclusion 2: P &(p-1)=0, and any T, TP &(p-1)=0 1 When a<p: left a%p=a, right a%(p-1)=a, left = right 2 If a=p: left a%p=0, right a&(p-1)=p&(p-1)=0, left = right, true. 3 If a>p: suppose a%p=b, then a/p=t remainder b, namely A = TP +b, where t>=1, and b belongs to [0,p-1]; So on the right hand side, a&(p-1)= tp+b &(p-1), we know that p is 2 to the NTH power, so tp is 0 except for the highest digit, and b is less than =p-1, so b is above the lowest digit of TP, So (tp + b) & (p = tp - 1) & (p - 1) + b & (p = 0-1) (2) conclusion conclusion (1) + b = b; Left = right, true; To sum up: if p is 2 to the NTH power, for any a, a%p=a&(p-1).Copy the code
num | high | high | low | low | low | low | low | low |
---|---|---|---|---|---|---|---|---|
p | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
p-1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
2p | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3p | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
p | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
2p+3 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
We see that TP is adding a 1 or a 0 to the high level without affecting the low level, which is always 0. So tp(p-1) is always 0, and tp+b(b<=p-1), n is always only added in the low order of P, which is 1 in p-1, so tp+b + p-1 is always equal to b.
- 3 Why hashcode evaluates to h^(h>>>16)
As we know, the index index is h&(size-1). In general, the size is very small and it is difficult to exceed 16 bits. In other words, only the low value of hashcode is effective at this point. What to do? (h^(h>>>16))&(sie-1) = (h^(h>> 16))&(sie-1) = (h>> 16))
x | y | with | or | Exclusive or |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 1 | 1 |
1 | 0 | 0 | 1 | 1 |
1 | 1 | 1 | 1 | 0 |
The ratio of operation 1 to 0 is 1:3, or the ratio of operation 1 to 0 is 3:1. Only the ratio of operation 1 to 0 in operation “xor” is 1:1, which is fair, so we take xor to ensure fairness, to ensure uniformity.
- 4 Why red black trees
We know that the list does not support random access, only one-way traverse, efficiency is very low, if the conflict is more serious, many nodes on the same index, then the list will be very long, the search efficiency is very low, while the use of red and black tree, search efficiency can be changed from the original linear time to logarithmic time, which is O (n) to O (logn), So for efficiency, I’m going to use a red-black tree, which is the dichotomy idea. The more serious the conflict, the more effective the red-black tree is. For example, if the length of the list is 1024, the efficiency of using the list is 1024, and the red-black tree is log(1024)=10, 100 times worse!