preface
I wrote a very good article about HashMap, but the most common question in the comments was why the initial load factor of HashMap was 0.75. I did a bit of private research and summarized the article.
This article is based on JDK1.8 and is explained here.
OK. Let’s start with the analysis.
Students can also learn something about concurrent programming before they start.
First, the role of load factor
For the study of HashMap, I had been thinking about how the source code was implemented, but now when I looked at it again, I found that the system default parameter values, is the essence of HashMap.
The load factor is related to the capacity expansion mechanism, which means that if the current capacity of the container reaches the maximum capacity that we set, the capacity expansion operation will start. Take an example to explain, in case the little white don’t understand:
For example, the current container capacity is 16, the load factor is 0.75,16*0.75=12, that is, when the capacity reaches 12, the operation will be expanded.
It acts as a simple threshold for a capacity expansion mechanism. When the threshold is exceeded, the capacity expansion mechanism is triggered. The HashMap source code already specifies a load factor of 0.75 for us by default.
The default load factor is 0.75, which we can specify in the constructor. Let’s formally analyze why 0.75 is the default.
Ii. Explanation of reasons (key points)
When we think about HashMap, the first thing to remember is that HashMap is just a data structure, and since it is a data structure, the main purpose is to save time and space. The role of the load factor is certainly to save time and space. Why save? Let’s consider two extreme cases.
1. Load factor is 1.0
Let’s start with the underlying data structure of the HashMap
Our data is originally stored in an array, and when a Hash collision occurs, it generates a linked list on that data node, and when the list reaches a certain length, it turns the list into a red-black tree.
When the load factor is 1.0, it means that the expansion will not occur until all eight values of the array (represented in this diagram) are filled. This is a big problem because Hash conflicts are inevitable. When the load factor is 1.0, that means a lot of Hash collisions, and the underlying red-black tree becomes extremely complex. It is extremely detrimental to query efficiency. In this case, time is sacrificed to ensure space utilization.
Therefore, in a word, the load factor is too large, although the space utilization rate is up, but the time efficiency is reduced.
2. The load factor is 0.5
When the load factor is 0.5, this means that when the array is half full, since there are fewer elements to fill, Hash collisions are reduced, and the length of the underlying list or the height of the red-black tree is reduced. Query efficiency will increase.
But, guys, that’s a huge drop in space utilization, because a Megabyte of data now means a Megabyte of space.
In a word, the load factor is too small. Although the time efficiency is improved, the space utilization rate is reduced.
3. Load factor 0.75
After the previous analysis, basically the answer to why 0.75 comes out, it is a tradeoff between time and space. Of course I didn’t come up with the answer on my own. The answer is in the source code, we can look at:
When the load factor is 0.75, the space utilization rate is high and a considerable number of Hash conflicts are avoided, which makes the height of the underlying linked list or red-black tree lower and improves the space efficiency.