preface
In the project, it’s nice to see that people are aware of the initial size of a HashMap when initializing it. But on closer inspection, there seemed to be something wrong. Specifying the size makes performance worse.
You may also feel that you have learned a lot from reading the Alibaba Java Development Manual, so you start to try to specify the initial size of the Map in practice, and feel that your code is a little bigger.
Yes, you are a step ahead of the average person when you are aware of specifying an initialization value, but if this value is poorly specified, the program will not perform as well as the default.
This article will analyze from beginning to end, readers pay more attention to the method of analysis and the underlying principle.
Ali development specification
Let’s take a look at how the Map initial size specification is described in the Alibaba Java development specification.
Alibaba Java Development Manual, Chapter 1 programming specification, Section 6 Collection processing clause 17 as follows:
[Recommendation] Specify the initial value of the collection when the collection is initialized. Note: HashMap is initialized using HashMap(int initialCapacity). If the collection size cannot be determined, specify the default value (16).
InitialCapacity = (Number of elements to be stored/load factor) + 1. Note that the default load factor is 0.75. If you cannot determine the initial value, set it to 16 (the default value).
Counterexample: A HashMap needs to hold 1024 elements. Because the initial capacity is not set, the capacity is expanded seven times as the number of elements increases, and the resize needs to be rebuilt. When there are tens of millions of elements in the set, continuous expansion can seriously affect performance.
From the above specification, we can learn a few things:
- First, the default size of a HashMap is 16;
- Second, capacity expansion is related to load factor and the number of storage elements.
- Third, the initial value is set to reduce the performance impact of rehash due to expansion.
It’s probably a good idea to start using specified collection initials in your code once you’ve read the specifications above. But a little careless, this will appear in the middle of a lot of problems, let’s take a look at examples.
Did you specify the correct initial value?
Go straight to the sample code and ask if there is a problem with it:
Map<String, String> map = new HashMap<>(4);
map.put("username","Tom");
map.put("address","Bei Jing");
map.put("age","28");
map.put("phone","15800000000");
System.out.println(map);
Copy the code
Similar code is not very familiar, write up also very bovine appearance. The HashMap uses four values and initializes four sizes. Is the space fully utilized and still meets the requirements of the Ali development manual? !
Is that really true? Are you sure? You might not see the problem looking at the code directly, so let’s add some printed information.
How to Verify capacity Expansion
Many friends might also want to verify when the HashMap was expanded, but have no idea or method. Here is a simple way to get and print the capacity value based on reflection.
To modify the above example, when adding data to the HashMap, print the corresponding capacity and size properties.
public class MapTest { public static void main(String[] args) { Map<String, String> map = new HashMap<>(4); map.put("username", "Tom"); print(map); map.put("address", "Bei Jing"); print(map); map.put("age", "28"); print(map); map.put("phone", "15800000000"); print(map); } public static void print(Map<String, String> map) { try { Class<? > mapType = map.getClass(); Method capacity = mapType.getDeclaredMethod("capacity"); capacity.setAccessible(true); System.out.println("capacity : " + capacity.invoke(map) + " size : " + map.size()); } catch (Exception e) { e.printStackTrace(); }}}Copy the code
The print method obtains the capacity and size attribute values in Map through reflection mechanism, and then prints them. In the main method, the Map’s capacity is printed for each new piece of data.
The print result is as follows:
capacity : 4 size : 1
capacity : 4 size : 2
capacity : 4 size : 3
capacity : 8 size : 4
Copy the code
What did you find? After the fourth data is put in, the capacity of the HashMap is expanded.
What was the purpose of specifying the initial capacity in the first place? Isn’t it to avoid the performance penalty of expansion? Now it’s causing expansion.
Now, if you drop the specified initial value and run the program using new HashMap<>(), it prints the following:
capacity : 16 size : 1
capacity : 16 size : 2
capacity : 16 size : 3
capacity : 16 size : 4
Copy the code
Found that the default values did not expand, in theory performance is higher. Isn’t that interesting? Are you also into this use mistake?
Let’s analyze the principle
The main reason for the above problems is that we have ignored the second clause in the summary code, which is the expansion mechanism.
The capacity expansion mechanism of HashMap is to expand capacity when the capacity expansion conditions are met. The expansion condition is that when the number of elements in the HashMap (size) exceeds the threshold, the HashMap will be automatically expanded. In HashMap, threshold = loadFactor * capacity. The default load factor is 0.75.
The load factor is 0.75. In the example, capacity is 4. The critical value is 4 * 0.75 = 3. In other words, when the actual size exceeds 3, expansion is triggered, which directly doubles the size of the HashMap. This is consistent with what we printed.
The implementation of JDK7 and JDK8 is the same. We will not analyze the implementation source code in this article. We know the basic principle and test effect can.
How much capacity is appropriate for HashMap initialization
After the above analysis, we can already see the implicit problem. What is the appropriate size for HashMap initialization? Should I just arbitrarily write a larger number?
This requires us to understand how the HashMap is handled when the initialization capacity is passed in.
When we use HashMap(int initialCapacity) to initialize the capacity, the HashMap does not use the initialCapacity passed in as the initialCapacity directly.
By default, the JDK calculates a relatively reasonable value for the initial capacity. A reasonable value is the first value that is greater than or equal to the power of 2 that the user passed in. The implementation source code is as follows:
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code
That is, when a HashMap is created and 7 is passed in, the initialized capacity is 8; When 18 is passed in, the initialized capacity is 32.
At this point, we come to the first conclusion: when setting the initial capacity, use a value of 2 to the n, and even if you don’t, the JDK will help take down the nearest value of 2 to the n.
The above values seem reasonable, but for our initial example, we have found that the initial capacity is not set to as much data as possible. Because there’s capacity expansion to consider.
According to the expansion formula, if the initial capacity is set to 8, 8 is multiplied by 0.75, which is six values. If the number of stored values is less than or equal to six, capacity expansion is not triggered.
Is there a formula that can be used to work backwards? The corresponding value can be calculated as follows:
Return (int) ((float) expectedSize / 0.75F + 1.0f);Copy the code
For example, if you plan to add 7 elements to the HashMap, expect expect size / 0.75f + 1.0f, and expect expect size 7/0.75 + 1 = 10,10 will be set to 16 after JDK processing.
In this case, 16 is a reasonable value and can greatly reduce the probability of expansion.
Therefore, it can be considered that setting the default size to expectedSize / 0.75F + 1.0f is a relatively good choice in terms of performance when the number of elements in the HashMap is clearly known, but at the same time, some memory will be sacrificed.
Other relevant knowledge
With that in mind, here are a few more HashMap facts:
- HashMap does not allocate bucket arrays immediately after new;
- The bucket array size of a HashMap is a power of two;
- HashMap is expanded when the number of put elements is greater than Capacity * LoadFactor (default: 16 * 0.75).
- JDK8 will convert the hashed list into a tree structure to improve performance when the length reaches TREEIFY_THRESHOLD (default 8).
- JDK8 is cleverly designed to reduce the performance cost of Rehash when resizing.
summary
This article looks at some of the pitfalls of using HashMaps, but the biggest takeaway is probably not to misuse them because you don’t know a lot about them. At the same time, some analysis methods and implementation principles are also introduced.
If you want to set the initial value of a HashMap, what value does it really matter? Not necessarily a big impact, but the performance optimization and the accumulation of personal skills, isn’t it just a little bit of improvement and improvement and gain?