// Expect to store 1W pieces of data, initialize the value 10000, avoid resize.
HashMap<String,String> map = new HashMap<>(10000)
// for (int i = 0; i < 10000; i++)
Copy the code
Expansion of Java collections
HashMap is one of the most commonly used collections, although for Android developers, Google officially recommends SparseArray and ArrayMap for less memory, but HashMap is still the most commonly used.
We store key-value data in the form of key-value pairs by HashMap, and its internal Hash table makes the best access efficiency reach O(1). However, due to possible Hash conflicts, we introduce the structure of linked list and red-black tree, so that the worst efficiency is no worse than O(logn).
Overall, as an industrial-scale hash table structure, HashMap is efficient.
The programming language provides the collection class, although the bottom layer is still based on arrays, linked lists such as the most basic data structure, but we use arrays directly, when the capacity of the collection is insufficient, will trigger dynamic expansion to ensure that there is enough space to store data.
Dynamic capacity expansion involves data copying and is a heavy operation. If you can determine the range of data to be stored in the collection in advance, you can specify the initial capacity of the collection through the construction method to ensure that dynamic expansion is not triggered in subsequent operations.
This brings us to the question at the beginning of this article. If we use a HashMap, when the initialization constructor specifies 1W, is it appropriate that we immediately store 1W of data later on, or does it not trigger expansion?
Before we look at this, let’s take a look at what happens when a HashMap is initialized to specify an initial capacity value.
PS: This article involves the code, the JDK 1.8 HashMap source code as an example.
Initialization of the HashMap
In HashMap, there is a constructor named HashMap(int initialCapacity) that specifies the initialCapacity. This method will eventually be called to the other HashMap constructor, where the loadFactor parameter is the default value of 0.75f.
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
Copy the code
The member variable threshold is the threshold used to store and trigger the expansion of HashMap. That is, when the amount of data stored in HashMap reaches the threshold, the expansion will be triggered.
We can see from the constructor logic that instead of using the initialCapacity passed in directly, the HashMap is tableSizeFor() and assigned to the threshole.
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code
In the tableSizeFor() method, tableSizeFor(), we can use the stepwise bit operation to keep the return value to the NTH power of 2. During capacity expansion, you can quickly calculate the location of data in the new table after capacity expansion.
So when we pass 1W in from the outside, actually tableSizeFor() method processing, it will become the 14th power of 2, 16384. With the load factor of 0.75f, in fact, on the premise of not triggering expansion, The data storage capacity is 12288 (16384 * 0.75f).
In this scenario, it’s more than enough to store 1W pieces of data, and it won’t trigger the expansion we’re guessing.
Table initialization of the HashMap
When we go to the initial capacity of 1,000, it’s a different story. It’s a case by case.
Again, in the constructor of HashMap, threshold is the threshold for capacity expansion. In the constructor, tableSizeFor() is used to adjust the value. Therefore, if 1000 is passed in the constructor of HashMap, the adjusted value of threshold is indeed 1024. But HashMap doesn’t use it directly.
If you think about it, the initialization determines the threshold value, but the loadFactor is not involved in the calculation. How does the HashMap handle the logic later on?
In HashMap, all data is stored in an array of member variables called table. The basic structure of the array remains the same in JDK 1.7 and 1.8, although the types of tables are different. Then the relationship among table, threshold and loadFactor is:
table.size == threshold * loadFactor
When was the table initialized? Which brings us to the question we’ve been avoiding, the scaling up of the HashMap.
In a HashMap, the logic for dynamic expansion is in the resize() method. This method not only takes care of the expansion of the table, it also takes care of the initialization of the table.
When we first call the put() method of HashMap to store data, if the table is null, we call resize() to initialize the table. The logic is in the putVal() method.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length; / / call resize ()
// ...
}
Copy the code
In the resize() method, the final threshold is adjusted and the table is initialized.
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null)?0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1;
}
else if (oldThr > 0)
newCap = oldThr; / / 1.
else {
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
/ / 2.
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr; / / 3.
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab; / / 4.
/ /...
}
Copy the code
Notice the comment tags in the code.
Since resize() incorporates dynamic expansion logic, I annotate the logic for initializing the table. XxxCap and xxxThr respectively correspond to the capacity of the table and the threshold of dynamic expansion. Therefore, there are two groups of data, old and new.
When we specify the initial capacity and the table is not initialized, oldThr will not be 0 and we will go to the logic of code ①. Assign newCap to oldThr, which means that the newly created table will have the capacity we specified when we constructed the HashMap.
We then enter the logic of the code (2), which adjusts the new threshold (newThr) by the loadFactor, but there are some restrictions to keep newThr within a legal range.
In code ③, the threshold adjusted by loadFactor is saved back into the threshold. Create a new array through newCap, specify it to the table, complete the initialization of the table (code ④).
TableSizeFor () specifies the size of the table. TableSizeFor () specifies the size of the table. And eventually the threshold will be readjusted by loadFactor.
They have the answer to go back to the previous question, though HashMap specified initial capacity of 1000, will be tableSizeFor changed () to 1024, but it just means the table array was $1024, The capacity expansion threshold is set to 768 (1024 x 0.75) in resize().
It is not enough to hold 1000 pieces of data, and eventually triggers a dynamic expansion before 1K pieces of data are stored.
Generally, when initializing a HashMap, the initial capacity is based on the business, not a fixed value. To do this, we need a special way to handle this, which is to divide the expected initial capacity by the HashMap loading factor, which by default is divided by 0.75.
For example, if you want to use HashMap to store 1K pieces of data, you should set 1000/0.75. The actual value passed in is 1333. Then the value will be adjusted by the tableSizeFor() method to 2048, which is enough to store data without triggering expansion.
When we want to store 1W of data in a HashMap, we still set 10000/0.75. The actual value passed in is 13333, which will be adjusted to 16384, just as if we had passed 10000 directly.
Summary moment
At this point, you know the initial capacity of a HashMap, how to calculate it scientifically, and essentially the value you pass in May not be able to store that much data directly, so there’s a dynamic adjustment process. Among them, we need to enlarge the expected value, and the more scientific thing is to enlarge according to the loading factor.
Finally, let’s summarize:
- The initialCapacity passed by the HashMap constructor, although stored in the loadFactor after processing, actually represents the capacity of the table.
- InitialCapacity, passed by the constructor, will eventually be
tableSizeFor()
Method Dynamically adjust the value to the NTH power of 2 to calculate the position of data in newTable during capacity expansion. - If the initial capacity of the table is set, the capacity expansion threshold will be adjusted to table.size * loadFactor during table initialization.
- Whether the HashMap is expanded is determined by the threshold, which in turn is determined by the initial capacity and loadFactor.
- If we know the range of HashMap data in advance, we can preset the capacity of the HashMap to improve efficiency, but we need to be careful to consider the impact of the load factor to ensure that it does not trigger unexpected dynamic expansion.
HashMap is one of the most commonly used collections in Java, and there are plenty of good articles on the market, but few have analyzed the logic in terms of initial capacity, which is a practical optimization point in a collection. In fact, many people are not sure whether the load factor should be considered when setting the initial capacity of a HashMap, which leads to this article.
If this article is helpful to you, comments, forwarding, favorites is the biggest support, thank you!
Public account background reply growth “growth”, will get my preparation of learning materials.