Why is the HashMap load factor 0.75? Red black tree conversion threshold is 8?

The body of the

Load factor is a measure of how full a hash table can be before its capacity is automatically increased. It measures how much space is used in a hash table. A larger load factor means that the hash table is more loaded, and vice versa.

For hash tables using linked lists, the average time to find an element is O(1+a). Therefore, if the load factor is larger, the utilization of space will be more full, but the result is the reduction of search efficiency. If the load factor is too small, the data in the hash table will be too sparse, resulting in a serious waste of space.

If you look at the source code, you’ll see that in the initial condition, HashMap is a compromise between time and space of 0.75.

/**
* The load factor used when none specified in constructor.
*/

static final float DEFAULT_LOAD_FACTOR = 0.75 f;

Copy the code

But why does it have to be 0.75? Instead of 0.8, 0.6, there’s a very important concept here: the Poisson distribution.

I believe that everyone has learned probability theory, to this famous law feeling should be familiar with and strange. The point of this article is not to educate you about probability theory, but to introduce it briefly.

Poisson distribution is one of the most important discrete distributions, and it mostly appears when X represents the number of events occurring in a certain time or space.

Take a simple example, if you are a boss, a new hotel, this time how to prepare the food for the day?

Prepare too much, finally can not sell so many dishes can only be wasted away; I don’t have enough preparation and I can’t do business. But you have a lot of peers and friends who can tell you a lot about experience.

For example, divide a day into several periods. How many guests will come to each period in the morning, afternoon and evening, and how many dishes will be ordered at each table. All of this gives you a rough idea of how many ingredients you need to prepare in a given day.

Let’s look at the HashMap source code comment:

Ideally, under random hashCodes, The frequency of nodes in bins follows a Poisson distribution with a parameter of about 0.5 on average for the default Ignoring variance, although with a large variance because of resizing Granularity. The expected occurrences of list size k are (exp(-0.5) * POw (0.5, k)/factorial(k)).

0:0.60653066

1:0.30326533

2:0.07581633

3:0.01263606

4:0.00157952

5:0.0001579

6:0.00001316

7:0.00000094

8:0.00000006

more: less than 1 in ten million

Ideally, using random hash codes, the frequency of node occurrences follows a Poisson distribution in the hash bucket.

Comparing the list of the number of elements in the bucket and the probability, it can be seen that when 0.75 is used as the loading factor, the probability has become very small when the number of elements in the bucket reaches 8. Therefore, it is almost impossible for the length of the linked list of each collision position to exceed 8. Therefore, it starts to transform into a red-black tree when the linked list node reaches 8.

Ending

I have compiled the knowledge points that often come up in the interview into an e-book, as well as resume templates and written questions from years past

Welcome to my public account “Criag Mowgli”

I’m Mowgli, Stay Tuned!!

Why is the HashMap load factor 0.75? Red black tree conversion threshold is 8?

The body of the

Ending

Welcome to my public account “Criag Mowgli”

Related Posts

Elastic Search goes from installation to easy to use

Byte interview ~ longest increasing subsequence

⭐MybatisPlus study Notes ⭐ (4) Conditional constructor Wrapper method detailed explanation