This article is based on JDK-8U261 source code analysis
This article is originally published in qikeTime wechat public account link: mp.weixin.qq.com/s/tksb5-90q… By: Geek Time – Tech team
1 introduction
HashMap is a utility class in the form of key-value pairs that is used frequently and is very easy to use. However, it is important to note that HashMap is not thread-safe, but rather ConcurrentHashMap (never mind the outdated Hashtable utility class). HashMap and ConcurrentHashMap are also used in the Spring framework for various caching purposes. Starting with Java 8, the source code for HashMap has been modified to improve its performance. Let’s take a look at the data structure of the HashMap:
The whole thing can be seen as an array + linked list. Arrays are for quick retrieval, and if the hash functions conflict, the list is followed at the same place. In other words, all nodes on the same list have the same hash value. However, if there are too many hash conflicts, the generated linked list will be pulled too long, and the retrieval will degenerate into traversal operation, resulting in low performance. In Java 8, red-black trees were introduced to improve the situation.
Red-black tree is a kind of advanced balanced binary tree structure, which can guarantee the time complexity of search, insert and delete is O(logn) at worst. Compared with AVL trees, red-black trees have higher insert and delete performance in large data volume scenarios. When the number of nodes in the list is greater than or equal to 8 and the length of the current array is greater than or equal to MIN_TREEIFY_CAPACITY All nodes in the list will be converted to a red-black tree, and if the number of nodes in the current list is less than or equal to 6, the red-black tree will be reduced to a linked list. The value of MIN_TREEIFY_CAPACITY is 64, that is, the length of the current array (that is, the number of bins) must be greater than or equal to 64, and the length of the current list must be greater than or equal to 8. If the length of the current array is less than 64, it will not be converted even if the length of the current list is greater than 8. This requires special attention. The following treeifyBin methods are used to convert lists to red-black tree operations:
1/** 2 * Replaces all linked nodes in bin at index for given hash unless 3 * table is too small, in which case resizes instead. 4 */
5final void treeifyBin(Node<K,V>[] tab, int hash) {
6 int n, index; Node<K,V> e;
7 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
8 resize();
9 else if ((e = tab[index = (n - 1) & hash]) ! =null) {
10 TreeNode<K,V> hd = null, tl = null;
11 do {
12 TreeNode<K,V> p = replacementTreeNode(e, null);
13 if (tl == null)
14 hd = p;
15 else {
16 p.prev = tl;
17 tl.next = p;
18 }
19 tl = p;
20 } while((e = e.next) ! =null);
21 if((tab[index] = hd) ! =null)
22 hd.treeify(tab);
23 }
24}
Copy the code
As you can see from lines 7 and 8 above, if the size of the current array is smaller than MIN_TREEIFY_CAPACITY, the resize operation will be selected and the red-black tree will not be converted. If the current hash conflict reaches 8, the root cause is that too few buckets are allocated. In this case, I choose the expansion operation to reduce the occurrence of hash conflicts. When the size of the array is greater than or equal to MIN_TREEIFY_CAPACITY, if the current list length is 8, it will be converted to a red-black tree.
The MIN_TREEIFY_CAPACITY parameter is added to ensure that hash collisions are not caused by a small array size. It also makes sense to say that all the data in the current array is in a bucket (or similar to the case where most nodes hang in a bucket (hash function hashing is not very effective, which is unlikely), and if there is no limit to MIN_TREEIFY_CAPACITY, Then I will happily generate a red-black tree (the process of red-black tree generation and subsequent maintenance is still quite complicated, so in principle, it is not generated if it can be generated, which will be explained later). With the MIN_TREEIFY_CAPACITY constraint, the capacity expansion is triggered at line 8 above. The point of expansion here is to minimize the hash conflict. For example, the eight nodes of the linked list with the length of 8 are divided into two new buckets in the new two-fold group after expansion, and each bucket has four nodes from the original eight nodes (5 in one bucket and 3 in the other, or 8 in one bucket and 0 in the other in extreme cases). However, statistically speaking, the number of nodes in the bucket is likely to be reduced), which is equivalent to reducing the number of linked lists, i.e., reducing the occurrence of hash collisions at the same location. It should also be noted that the source code notes that MIN_TREEIFY_CAPACITY should be at least 4 times the number of red-black tree conversion thresholds, in the hope of reducing hash collisions.
** So why not just use a red-black tree instead of a linked list, but use a linked list and a red-black tree together? ** The reason is that red-black trees perform better, but the difference is only visible in large data volumes. If the current data volume is very small, just a few nodes, then it is obviously more cost-effective to use a linked list. Remember that the insertion and deletion of a red-black tree involves a lot of spin to keep the tree structure balanced. If the amount of data is small, the performance efficiency of insert and delete simply does not offset the cost of spin.
** Another point to note is that the threshold for a list to become a red-black tree is 8, while the threshold for a red-black tree to degenerate into a list is 6. Why are these two values different? ** If both of these values are 8, and the current list has 7 nodes, a new node comes in, and the hash value calculated is the same as the hash value of the seven nodes, a hash conflict has occurred. This node is then hung after the seventh node, but the threshold for becoming a red-black tree has been reached (the MIN_TREEIFY_CAPACITY condition is also met), and it becomes a red-black tree. But then we call the remove operation and we need to get rid of this new node, and when we get rid of this new node, the number of nodes in the current red-black tree becomes 7 again, and we’re reduced to a linked list. Then a new node is added, which is just behind the seventh node, so it becomes a red-black tree again, and then it has to remove, and degenerates into a linked list… As you can see, in this scenario, there are constant conversions between the linked list and the red-black tree. This performance is very low because most of the execution time is spent converting the data structure, while I only do several consecutive additions and deletions. Therefore, in order to avoid this situation, stagger the two thresholds so as to avoid the performance deterioration caused by frequent data structure conversion operations near the threshold point.
The reason why the threshold value is selected as 8 is based on the mathematical and statistical conclusion, and there are relevant notes in the source code:
The middle number represents the expected probability of a specified number of hash collisions at the current location. It can be seen that when the conflict probability is 8, the probability has been reduced to 0.000006%, which is almost impossible to happen. You can also see from this that the HashMap author chose this number as a threshold because he didn’t want to generate a red-black tree (which is expensive to maintain). The default load factor of 0.75 is also based on statistical analysis. The following is the source code for the load factor:
Load factor is a measure of how much the array is filled before expansion. That is to say, the actual capacity of an array is equal to the length of the array * the load factor (for example, the current array length is 16 (number of buckets), and the load factor is 0.75, then when the array size is 16 * 0.75 = 12 buckets, Instead of waiting until the array space is full). If the value is 0.5, it means that the array will be expanded after being filled half. A value of 1 means that the array is fully filled and then expanded. The default value of 0.75 is a compromise in terms of time and space costs, and it is not recommended to change it yourself. The higher the value is, the more values can be stored in the array, reducing the space overhead, but increasing the probability of hash conflicts and the cost of searching. A lower value reduces the probability of hash collisions, but consumes space.
The array’s default size of 16 is also a statistical result. It’s worth noting that if you know how much a HashMap will store in advance, you can pass the array capacity into the constructor to avoid frequent expansion operations. For example, if I want to add about 200 pieces of data to the HashMap, the default size is 16 if I don’t set the initial value, I will expand it once when I add 16 * 0.75 = 12 pieces of data, and then expand it again when I add 32 * 0.75 = 24 pieces of data… Until it can store 200 data. So if the initial capacity had been set up in advance, you wouldn’t have to expand it so many times.
2 the constructor
1/** 2 * HashMap: 3 * no-parameter constructor 4 */
5public HashMap(a) {
6 // The load factor is set to the default value of 0.75, and all other attributes are kept by default
7 this.loadFactor = DEFAULT_LOAD_FACTOR;
8}
9
10/** 11 * parameter constructor 12 */
13public HashMap(int initialCapacity) {
14 // The initial capacity is self-specified and the load factor is 0.75 by default
15 this(initialCapacity, DEFAULT_LOAD_FACTOR);
16}
17
18public HashMap(int initialCapacity, float loadFactor) {
19 //initialCapacity non-negative check
20 if (initialCapacity < 0)
21 throw new IllegalArgumentException("Illegal initial capacity: " +
22 initialCapacity);
23 InitialCapacity If the set maximum value (2 ^ 30) is exceeded, it is reset to 2 ^ 30
24 if (initialCapacity > MAXIMUM_CAPACITY)
25 initialCapacity = MAXIMUM_CAPACITY;
26 // Load factor non-negative check and illegal digit check (when the divider is 0 or 0.0 and the divisor is 0.0, the result is NaN)
27 if (loadFactor <= 0 || Float.isNaN(loadFactor))
28 throw new IllegalArgumentException("Illegal load factor: " +
29 loadFactor);
30 this.loadFactor = loadFactor;
31 /* 32 Set threshold to be greater than or equal to the minimum 2 power of the current set array capacity. 33 Threshold will be updated to the new array capacity * load factor in the resize expansion method, i.e., the next expansion point 34 */
35 this.threshold = tableSizeFor(initialCapacity);
36}
37
38/** 39 * This method is used to calculate the minimum power greater than or equal to cap, but it is implemented in a clever way that takes full advantage of binary 40 */
41static final int tableSizeFor(int cap) {
42 /* 43 The -1 operation is used in case cap is already a power of 2, as explained later. For ease of understanding, here is an example: 44 Suppose cap is 34 (100010) and n is 33 (100001). We're really just going to focus on the position where the first highest digit is 1, 45 to the right where the first digit is 1. General explanation is 01 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX (x represents uncertainty, need not care about this position is 0 or 1) 46 * /
47 int n = cap - 1;
48 / * 49 after a moves to the right and bitwise or after operation, and n into 110001 (100001 | 010000) 50 general explanation: now turned into a 011 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX 51 * /
52 n |= n >>> 1;
53 / * 54 now after two moves to the right and bitwise or after operation, 111101 (110001 | 001100) n 55 general explanation: now turned into a 01111 XXXXXXXXXXXXXXXXXXXXXXXXXXX 56 * /
57 n |= n >>> 2;
58 / * 59 now moves to the right after four times and bitwise or after operation, 111111 (111101 | 000011) n 60 general explanation: became XXXXXXXXXXXXXXXXXXXXXXX 61 * 011111111 / at this time
62 n |= n >>> 4;
63 /* 64 After eight right shifts and a bit or, in the case of the above example, all the bits are 1, 65 then there is no difference between the following two right shifts done or not done (because the result of the right shift is definitely 0, And the number of the original bitwise or after is unchanged) 66 actually moves to the right after so many times and bitwise or, to the last number 67 is a fully 1 point can be made general explanation: at this time into 01111111111111111 XXXXXXXXXXXXXXX 68 * /
69 n |= n >>> 8;
70 /* 71 After 16 times right shift operation and by bit or, general interpretation: At this time was 01111111111111111111111111111111, 72 need to explain is int digits is 32 bit, so you just need to right 16 can stop (right, of course, also can continue to move the 32-bit, 64... 74 The maximum value of an int MAX_VALUE is 2 to the 31th power -1, which means there are 31 ones. 75 was changed to 2 to the thirtieth on line 25 earlier. 1 followed by 30 zeros is 010000... So the maximum cap that can be passed into this method can only be this number, and after -1 and a few more moves to the right it becomes 00111... 111 is 30 ones and 77 is finally corrected to 2 to the 30th power after +1 at line 90, which is MAXIMUM_CAPACITY 78 */
79 n |= n >>> 16;
80 /* 81 n less than 0 corresponds to the case where the cap itself is 0. When you move it to the right, n becomes -1 (actually, it doesn't change anything at all, 82 because -1 is 32 ones in binary, and it doesn't change anything in bitwise or bitwise), and then you return 1 (2 to the 0 power), 83, 84, So what you end up with is a raw number that starts with the highest digit one, and then all the zeros and zeros are changed to one and eighty-five and then at line 90 you add one more and it's a number that has the highest digit one and the rest of the zeros, but it has one more digit than the original number, So it's going to be the lowest power of two of the original number 86, 87 and now we can think about what we said before if the cap that comes in is itself a power of two. If you don't have line 47, then if you keep moving it to the right, 88 is going to be a binary number with all 1's, which is the result of this number times 2 minus 1, and then if you add one more, it's going to be twice the original number, which is obviously not right 89 */
90 return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
91}
Copy the code
3 put method
1/** 2 * HashMap: 3 */
4public V put(K key, V value) {
5 return putVal(hash(key), key, value, false.true);
6}
7
8/** 9 * Computates the hash value of the key 10 * Note that this calls the hashCode method of the key directly, and xor the high and low 16 bits of the key as the final hash 11 * Table. Length - 1 hash 12 * The length of the array must be a power of 2 (if not, it will be converted to a power greater than the minimum of 2. For details, see tableSizeFor). So the length of the array minus 1 13 * in binary is 11111... A number where the low place is all 1's. The result is a 14 * hash with all the lower values, which is the same as hash % table.length. Is the length of the array for more than just 15 * but directly use % do take more efficiency is not high, and the way to only applies to the length of the array is a power of 2, isn't this kind of circumstance is cannot make the equivalent exchange 16 * 17 * can see from the above that way and will only use the hash value of the low information, It doesn't matter what the high level information is, it's not going to be recorded in the final hash, so it's kind of a pity, kind of a waste. It would be even better if high level messages were also recorded. So in line 26 below, 19 * performs xOR on the top 16 bits and xOR on the bottom 16 bits, in order to incorporate the feature information of the top 16 bits into the hash value, making the hash as hash as possible to reduce the occurrence of hash conflicts. 20 * also performs xor at the same time, which is very simple and efficient. Unlike some hash functions that do a lot of computation before generating a hash value (such as this 21 * implementation in Java 7 will have a lot of right moves) 22 * <<< in the underlying source code, under the premise of meeting the requirements, can be implemented as simple and efficient, is king >>> 23 */
24static final int hash(Object key) {
25 int h;
26 return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
27}
28
29/** 30 * Line 5:31 */
32final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
33 boolean evict) {
34 Node<K, V>[] tab;
35 Node<K, V> p;
36 int n, i;
37 if ((tab = table) == null || (n = tab.length) == 0)
38 // If the array is not already initialized, initialize it. As you can see, the initialization of the HashMap is deferred to the PUT method
39 n = (tab = resize()).length;
40 if ((p = tab[i = (n - 1) & hash]) == null)
41 /* 42 Use (n-1) &hash to find the location of the bucket to which the data is to be inserted. 43 If no data exists on the bucket, create a new Node and insert it into the bucket
45 tab[i] = newNode(hash, key, value, null);
46 else {
47 /* 48 Otherwise, if there is data on the bucket, execute the following logic: 49 50 e is used to check whether the new node can be inserted into the bucket. If it is not null, it means that the new node has been inserted into the bucket
52 Node<K, V> e;
53 K k;
54 if (p.hash == hash &&
55((k = p.key) == key || (key ! =null && key.equals(k))))
56 /* 57 If the hash value of the first node on the bucket is the same as the hash value to be inserted, and the key is the same, 58 records the position e, and overwrites the value (quick judgment mode) 59 */
60 e = p;
61 // The node to be inserted is not the same node as the first node in the current bucket, but they compute the same hash value
62 else if (p instanceof TreeNode)
63 // If the node is a red black tree, perform the red black tree insert node logic (red black tree analysis is not expanded)
64 e = ((TreeNode<K, V>) p).putTreeVal(this, tab, hash, key, value);
65 else {
66 // There are multiple nodes on the list, traversing all the nodes on the list (the first node is not needed, because it is already determined at line 54)
67 for (int binCount = 0; ; ++binCount) {
68 if ((e = p.next) == null) {
69 /* 70 If the next position in the current list is null, it means that the loop has reached the last node, and the new node to be inserted is inserted at the last 72 */
73 p.next = newNode(hash, key, value, null);
74 // If the number of linked lists has reached the threshold for turning into a red-black tree, the conversion is performed
75 if (binCount >= TREEIFY_THRESHOLD - 1)
76 /* the number of buckets in the current array 78 is greater than or equal to MIN_TREEIFY_CAPACITY. If the number of buckets in the current array 78 is greater than or equal to MIN_TREEIFY_CAPACITY, it is only 79 */
80 treeifyBin(tab, hash);
81 break;
82 }
83 if (e.hash == hash &&
84((k = e.key) == key || (key ! =null && key.equals(k))))
85 // If this node was already in the list, you can exit the loop (e is already assigned at line 68)
86 break;
87 p = e;
88 }
89 }
90 if(e ! =null) {
91 /* 92 If the insertion location is found, the value overwrites 93 94 records the old value, and finally returns 95 */
96 V oldValue = e.value;
97 // If onlyIfAbsent is false, or the old value itself is null, the new value overwrites the old value
98 if(! onlyIfAbsent || oldValue ==null)
99 e.value = value;
100 // Hook method, empty implementation
101 afterNodeAccess(e);
102 return oldValue;
103 }
104 }
105 /* line 45 adds a new node. 107 108 Changes +1, modCount is used for quick failures. If changes are made in the iterator, modCount! = expectedModCount, 109 shows that the HashMap modified by other threads, will throw ConcurrentModificationException (I based on the analysis of source of ArrayList 110 article explains in detail the process, This is similar in HashMap) 111 */
112 ++modCount;
113 /* 114 Since a new node is added, determine whether to expand the array capacity. 115 If the current array capacity exceeds the threshold, perform the expansion operation 116 */
117 if (++size > threshold)
118 resize();
119 // Hook method, empty implementation
120 afterNodeInsertion(evict);
121 return null;
122}
Copy the code
4 the resize method
Lines 39 and 118 of the putVal method above call resize to initialize or expand. The resize method is also one of the more quintessence and highlights of the HashMap source code. Its implementation can be roughly divided into two parts: setting the capacity expansion flag and specific data migration process. Let’s take a look at the first half of the resize method source code:
1/** 2 * HashMap: 3 * Expansion operation (if the current array is empty, it becomes an array initialization operation) 4 */
5final Node<K, V>[] resize() {
6 Node<K, V>[] oldTab = table;
7 // Record the capacity of the old array (current array), 0 if null
8 int oldCap = (oldTab == null)?0 : oldTab.length;
9 /* 10 1. When the parameter constructor is called, the value of threshold is greater than or equal to the second power of the current array capacity, which is assigned to oldThr 11 2. 3. Before the array is not empty, now in the expansion process, threshold stores the old array capacity * load factor 13 */
14 int oldThr = threshold;
15 //newCap refers to the number of arrays expanded, newThr refers to the result of newCap * load factor (if there is a decimal, take the integer part)
16 int newCap, newThr = 0;
17 // The newCap and newThr flag bits are assigned
18 if (oldCap > 0) {
19 if (oldCap >= MAXIMUM_CAPACITY) {
20 /* 21 If the current size of the array exceeds the set maximum value, set threshold to the maximum value of int, and return the current size of the array 22, meaning that no expansion operation is performed in this case 23 */
24 threshold = Integer.MAX_VALUE;
25 return oldTab;
26 } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
27 oldCap >= DEFAULT_INITIAL_CAPACITY)
28 /* 29 Set newCap to oldCap *2 if the current array size does not exceed the set maximum and the current array size is greater than or equal to 16. NewThr is set to oldThr * 2. 31 oldCap >= DEFAULT_INITIAL_CAPACITY 32 */ will be explained later
33 newThr = oldThr << 1;
34 } else if (oldThr > 0)
35 /* 36 oldCap=0 when the array is initialized. We have just seen that 37 threshold will not be assigned if the default no-parameter constructor is used, i.e., 0. However, if the constructor is called with parameters, threshold will be assigned at the constructor initialization stage, 38, and this if condition corresponds to this case. This is set to oldThr because you can see in line 14 above that oldThr points to threshold, 39, which means that oldThr is greater than or equal to the minimum power of the current array capacity (note that I'm quoting the current array capacity here), 40 is not the actual physical capacity of the array (currently 0), but the capacity passed in by the constructor), 41 is definitely a number greater than 0. Since this oldThr now represents the new capacity I want to set, I can simply assign newCap to the same number 42 */
43 newCap = oldThr;
44 else {
45 When threshold=0, 47 sets newCap to 16 and newThr to 16 * 0.75 = 12, both of which take the default values of 48 */
49 newCap = DEFAULT_INITIAL_CAPACITY;
50 newThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
51 }
52 if (newThr == 0) {
53 /* 55 There are two ways to get there: If newThr is not assigned in the if condition at line 43, ft = new array size * load factor, 56 If the array size and ft do not exceed the set maximum, newThr is assigned ft, otherwise int 57 58 Note the condition at line 27 above that newThr is also unassigned if oldCap is less than 16. 59 The root cause of this is not the call to resize, but the last call to initialization. The new HashMap<>(2) constructor is called and threshold=2. And then we call the put method to trigger the initialization and we jump into that 61 so oldCap=0, oldThr=2. Then we go to line 34, newCap=2, and then we go to line 52, 62, newThr=1, and then we change threshold=1. After that, the resize method will perform specific expansion operations. 63 But the flag bits set before will not be changed, and the method will exit after expansion. This is the first call to the procedure. 64 Then, after successfully inserting the node, I call put again. At this point, the node will still be successfully inserted, 65 But in line 117 of the above putVal method, it is found that my current size=2 is already greater than threshold=1. OldCap =2, oldThr=1; oldCap=2, oldThr=1. NewCap =4 at line 26, oldCap=2 is less than 16, 67 breaks out of the if condition, and enters line 52, the if condition: NewThr =3, then change threshold=3 68 this is correct because newThr itself is the result of the newCap * load factor, i.e. 4 * 0.75 =3 69 so if there is no condition in line 27 of the source code, It jumps to line 33, where oldThr=1, so newThr=2, 70 and you can see that newCap=4 and newThr=2, there is an error, 4 * 0.75 does not equal 2. So the condition 71 oldCap >= DEFAULT_INITIAL_CAPACITY at line 27 changes 72 to update newThr at line 97 from line 33, Of course, this is just one example of what can go wrong, not the essence. In fact, I demonstrated this result by comparing some other data and found that the difference between 75 buckets and 16 buckets also exists. In fact, the above error can be generalized: one is the original capacity * load factor rounded, then *2, 76 and the other is the original capacity *2 * load factor rounded. The difference is the timing of the round. The difference is that 77 is a number with a decimal after the original capacity * load factor (there is no difference if it is an integer, and not all numbers with a decimal will be different), 78 is rounded and then *2, the difference is magnified, resulting in the final difference. Another clue is 79 oldCap >= DEFAULT_INITIAL_CAPACITY at line 27. Note that this is greater than or equal to, not less than or equal to, which means that 80 must be greater than or equal to 16 before entering the if condition. Anything less than 16 will enter the if condition 81 (to accurately evaluate the threshold result). So if the number of buckets is greater than 16, one more or one less threshold may not be so important. 82, for example, 1024 buckets, it doesn't seem to make much difference whether I expand when 819 buckets are full or when 818 buckets are full. In this case, 83 is just because I subtract one more threshold, So that's going to make hashing higher, right? Or more wasted space? It doesn't have to be 84 because the amount of data is here, and the load factor is usually less than 1, so the difference is at most 1. This difference will become less and less important as the 85 number gets bigger and bigger. However, if I expand the capacity of four buckets when two buckets are full or three buckets are full, 86 will make a big difference, and the comparison of probability of hash conflict between these two cases will be quite large. One might be 20%, the other might be 40%, 87 and if you had a larger number of buckets it might be 1.2% versus 1.3% (I'm just making this up). In this way, when there is a large amount of data 88 and the expansion method is called frequently (for example, I want to store a very large capacity but do not specify the initial capacity), I sacrifice the accuracy of the calculation threshold 89 (if the load factor is set properly, the difference is only one difference), But in exchange for faster execution (note the left shift at line 33, 90 and multiplication at line 96, which is followed by a ternary operator, and then round again); However, when the amount of data is small, it is obviously more important to calculate accurately. 91 and there is no performance difference when the amount of data is small, after all, the threshold set here is 16. So in the comment on line 14 above, 92 threshold has a fourth value: the old array size * load factor -1 (the exact number subtracted depends on the load factor set and the array size), 93 but this is definitely a special case of the third one. To sum up: the significance added in line 27 of code is to sacrifice some accuracy when the number of buckets is large and the capacity expansion method 94 is frequently invoked, but it can make the calculation of threshold faster, which is an optimization method 95 */
96 float ft = (float) newCap * loadFactor;
97 newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ?
98 (int) ft : Integer.MAX_VALUE);
99 }
100 // After the above operations, update the value of threshold
101 threshold = newThr;
102
103 / /...
104}
Copy the code
After setting the newCap, newThr and threshold flag bits, the following is the specific process of data migration, which is also the realization of the second half of the resize method:
1/** 2 * HashMap: 3 */
4final Node<K, V>[] resize() {
5 / /...
6
7 // Now that the newCap and newThr flag bits are set, a new Node array will be created based on newCap's new capacity to replace the old array
8 @SuppressWarnings({"rawtypes", "unchecked"})
9 Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap];
10 table = newTab;
11 // If the old array is null, the array is initialized. Just return the new array you created, newTab
12 if(oldTab ! =null) {
13 // Iterate over each bucket on the old array
14 for (int j = 0; j < oldCap; ++j) {
15 Node<K, V> e;
16 // If there is no data on the bucket in the old array, skip it
17 if((e = oldTab[j]) ! =null) {
18 // This node on the old array is assigned null for quick GC
19 oldTab[j] = null;
20 if (e.next == null)
21 /* 22 There is no subsequent node after the first node, which means there is only one node on the bucket
25 newTab[e.hash & (newCap - 1)] = e;
26 else if (e instanceof TreeNode)
27 // If the tree is a red black tree, perform the migration logic of the red black tree.
28 ((TreeNode<K, V>) e).split(this, newTab, j, oldCap);
29 else {
30 / * 31 go here shows that the current bucket there are more than one node, at this time will do multiple nodes on the list of migration work 32 first is the major premise: now the old array on the location of the barrel is j, and prepared into the new array of barrel location, there are two: one is the j, 33 that is on the new array will be in j this position; The other one is j+ the capacity of the old array. For example, if the current bucket is 15, 34 and the old array is 16, then the second bucket to be inserted in the new array is 15 + 16 = 31, 35, 36. Four pointer positions are defined below. 37 corresponds to 38 */ for each of the two newly inserted bucket positions described above
39 Node<K, V> loHead = null, loTail = null;
40 Node<K, V> hiHead = null, hiTail = null;
41 Node<K, V> next;
42 do {
43 //next points to the next node of the current node
44 next = e.next;
45 /* 46 insert into j or j+ the old array size. 47 is also very simple to determine by the hash value of the node and the capacity of the old array in bitwise sum. The size 48 must be a power of 2, 1000... 000, where the highest bit is 1 and the remaining bits are all zeros 49. The result of this bitwise sum is to extract the hash of the node at the 1 position corresponding to the old array. 50 For example, if the hash value of the node is 1010110 and the size of the old array is 16 (1000), the result of the bitwise sum is that 51 fetches the fourth hash value from right to left, that is, 0. That is, where the new array is located depends on whether the hash value 52 corresponds to a 0 or a 1 in the old array of capacity 1. 53 (also can be seen from here, in addition to the table. The length - 1) & hash used to judge the position of the inserted into the barrel, this way is a must array capacity is a power of 2, 54 in the expansion do migrations, also must request this 55 and 56 bitwise and only the results of two kinds, 0 or 1, So if it's 0 it's going to be inserted into the new array at j, and if 57 is 1 it's going to be inserted into the old array at j+ 58 */
59 if ((e.hash & oldCap) == 0) {
60 if (loTail == null)
61 // If the current node is the first to be inserted, the loHead pointer points to it
62 loHead = e;
63 else
64 /* If 65 is not the first node, the next pointer to the loTail pointer points to it, so that the chain is pulled up
68 loTail.next = e;
69 // Update the loTail pointer to point back to the last node
70 loTail = e;
71 } else {
72 //(e.hash & oldCap) == 1
73 if (hiTail == null)
74 // If the current node is the first to be inserted, the hiHead pointer points to it
75 hiHead = e;
76 else
77 /* if 78 is not the first node, the next pointer to the tail of the hiTail pointer points to it, so that the chain is pulled up
81 hiTail.next = e;
82 // Update the hiTail tail pointer to repoint to the last node
83 hiTail = e;
84 }
85 // If the current list does not reach the last node, continue the loop to determine
86 } while((e = next) ! =null);
87 Insert the two lists into position j and j+ the size of the old array 90 */
91 if(loTail ! =null) {
92 /* 93 If there are nodes in the low list, update the next pointer on the last node of the low list to null. This is important because next would be worth something if the previous node was not the last one on the old bucket of 95. If you don't change it to null, the pointer is 96 */
97 loTail.next = null;
98 // Insert the list into position j
99 newTab[j] = loHead;
100 }
101 if(hiTail ! =null) {
102 /* next pointer on the last node of the high list is null. This is important because next would have been worth something if the previous node had not been the last one on the bucket of 105. If you don't change it to null, the pointer will be 106 */
107 hiTail.next = null;
108 // Insert the list into j+ the old array capacity
109 newTab[j + oldCap] = hiHead;
110 }
111 }
112 }
113 }
114 }
115 return newTab;
116}
Copy the code
In the Java 7 HashMap source code, the transfer method is used to migrate data during expansion. This is done by rehashing each node in the list. In this case, pointer modification will be involved. In the scenario of high concurrency, the next pointer of a node on the linked list may point to the previous node, that is, an infinite loop and a dead chain will be formed (the specific formation process will not be expanded) :
It never stops iterating through the list, and it’s a bug. Java 8 solves this problem by creating two linked lists where the hash value of the node is inserted at the position where the binary number of the array is 1 by bitwise and determining whether the hash value is 0 or 1. In addition, the execution speed is improved by not rehashing each node.
If you want to rehash an array, you can insert the array at the original location and the original location plus the old array capacity. Student: Can’t you add something else? There’s actually a reason why I’m adding the old array capacity. As we all know, the array must be a power of two, 100… 000, and the new array is twice the size of the original array, so the “1” in the original value is moved one bit to the left, and we use the (array capacity -1) & hash value to determine where to insert the bucket. Putting all this information together, we can see that the difference between calculating the bucket position in the new array and the bucket position in the old array is actually the one on the binary of the old array. If it’s 0, it’s the same location as the old array, and if it’s 1, it’s the location of the old array plus the capacity of the old array. Here’s an example:
Both nodes now calculate the position of the bucket to be 1010, the 10th bucket. They’re all going to be in the list in bucket number 10.
Now that the array has expanded to 32, let’s see what happens:
Node 1 is 01010, node 2 is 11010. In other words, when we execute the get method (which also hashes the location of the bucket), we find node 1 in the 10th bucket and node 2 in the 26th bucket. The difference between the two nodes is 16 buckets. When we were expanding, we were bitwise summing the hash value of the node with the capacity of the old array, which is actually looking for the position in the red box above. If 0, node 1 is placed in the 10th bucket (1010) in the new array, where it was before; If it is 1, node 2 is placed in the 26th bucket of the new array (1010+10000), where it was + the old array capacity. By doing this, I can also ensure that when I execute the GET method, I can find the correct location of the node in the bucket of the new array. It also shows that this strategy is not changeable. In other words, it can’t be that if it’s 1 it’s going to be inserted into the original location of the new array, and if it’s 0 it’s going to be inserted into the original location plus the capacity of the old array. If you do this, you end up missing the node when the get method looks for it. So it’s nice to be able to calculate the correct location of the new bucket with this expansion method in Java 8, while avoiding rehashing each node and saving computing time.
5 the get method
1/** 2 * HashMap: 3 */
4public V get(Object key) {
5 Node<K, V> e;
6 // Return null if no value is obtained, or if the value inserted is null, otherwise return a specific value
7 return (e = getNode(hash(key), key)) == null ? null : e.value;
8}
9
10final Node<K, V> getNode(int hash, Object key) {
11 Node<K, V>[] tab;
12 Node<K, V> first, e;
13 int n;
14 K k;
15 // If the array is not initialized, or the computed bucket position is null (indicating that the key cannot be found), null is returned
16 if((tab = table) ! =null && (n = tab.length) > 0 &&
17 (first = tab[(n - 1) & hash]) ! =null) {
18 if (first.hash == hash &&
19((k = first.key) == key || (key ! =null && key.equals(k))))
20 /* 21 If the hash value of the first node on the bucket is the same as the hash value and the key is the same, 22 returns (quick judgment mode) 23 */
24 return first;
25 /* 26 If the next node is null, i.e., there is only one node on the bucket, 27 and the previous node is not, then null is returned, i.e., 28 */ cannot be found
29 if((e = first.next) ! =null) {
30 if (first instanceof TreeNode)
31 // If the node is a red black tree, perform the red black tree to find the node logic.
32 return ((TreeNode<K, V>) first).getTreeNode(hash, key);
33 /* 34 there are multiple nodes in the list, and you can traverse each node in the list to find it (the first node is not needed, 35 is already determined at line 18) 36 */
37 do {
38 if (e.hash == hash &&
39((k = e.key) == key || (key ! =null && key.equals(k))))
40 return e;
41 } while((e = e.next) ! =null);
42 }
43 }
44 return null;
45}
Copy the code
6 the remove method
1/** 2 * HashMap: 3 */
4public V remove(Object key) {
5 Node<K, V> e;
6 // Return null if the node to be deleted cannot be found, otherwise return the deleted node
7 return (e = removeNode(hash(key), key, null.false.true)) = =null ?
8 null : e.value;
9}
10
11final Node<K, V> removeNode(int hash, Object key, Object value,
12 boolean matchValue, boolean movable) {
13 Node<K, V>[] tab;
14 Node<K, V> p;
15 int n, index;
16 // If the array is not initialized, or the computed bucket position is null (indicating that the key cannot be found), null is returned
17 if((tab = table) ! =null && (n = tab.length) > 0 &&
18 (p = tab[index = (n - 1) & hash]) ! =null) {
19 Node<K, V> node = null, e;
20 K k;
21 V v;
22 if (p.hash == hash &&
23((k = p.key) == key || (key ! =null && key.equals(k))))
24 // If the hash value of the first node on the bucket is the same as the hash value and the key is the same, record the location
25 node = p;
26 else if((e = p.next) ! =null) {
27 //e is the next node to the first node on the bucket. If no node is found, null is returned
28 if (p instanceof TreeNode)
29 // If the node is a red black tree, perform the red black tree to find the node logic.
30 node = ((TreeNode<K, V>) p).getTreeNode(hash, key);
31 else {
32 /* 33: /* 33: there are multiple nodes in the list. If the first node is found, record the location
36 do {
37 if (e.hash == hash &&
38 ((k = e.key) == key ||
39(key ! =null && key.equals(k)))) {
40 node = e;
41 break;
42 }
43 // p records the last node of the current node
44 p = e;
45 } while((e = e.next) ! =null);
46 }
47 }
48 /* 49 If the node to be deleted is found and matchValue is true, 50 if the values are equal (if matchValue is true, it can only be deleted if values are equal) If node 1 is not null, then return null (52 */ is not deleted)
53 if(node ! =null&& (! matchValue || (v = node.value) == value ||54(value ! =null && value.equals(v)))) {
55 if (node instanceof TreeNode)
56 // If the node is a red black tree, perform the red black tree delete node logic.
57 ((TreeNode<K, V>) node).removeTreeNode(this, tab, movable);
58 else if (node == p)
59 /* 60 If the node to be deleted is the first node on the bucket, the first location of the current bucket is directly assigned to the next node 61 (null if next is null) 62 */
63 tab[index] = node.next;
64 else
65 // Instead of the first node on the bucket pointing next from the previous node to the next node, that is, removing the node from the list and waiting for GC
66 p.next = node.next;
67 // Change times +1 (quick failure mechanism)
68 ++modCount;
69 // If the node is found, the size is -1
70 --size;
71 // Hook method, empty implementation
72 afterNodeRemoval(node);
73 return node;
74 }
75 }
76 return null;
77}
Copy the code
7 clear method
1/** 2 * HashMap: 3 */
4public void clear(a) {
5 Node<K, V>[] tab;
6 // Change times +1 (quick failure mechanism)
7 modCount++;
8 if((tab = table) ! =null && size > 0) {
9 //size indicates the number of buckets that have data. Set size to 0 to clear data
10 size = 0;
11 // Set each bucket in the table to NULL
12 for (int i = 0; i < tab.length; ++i)
13 tab[i] = null;
14 }
15}
Copy the code