HashMap is a frequent use of back-end interviews, such as what is the default initial capacity? What is the loading factor? Is the thread unsafe? How do I repeat the put operation? Get? What are the differences between JDK 1.7 and 1.8 implementations? And so on a series of questions, maybe you can answer these questions fluently, indicating that you have a relatively good understanding of HashMap. But recently, my team members did a technical sharing, among which I quite gained some points, I would like to share with you

We have technology sharing every Friday, and we take turns to share. In fact, this mechanism is quite good. We sit together and discuss a knowledge point in depth, so as to have a collision of thinking and win

Throw out two questions and see if you can answer them?

  1. How do I find the smallest integer to the power of 2 that is greater than the initial capacity value set?
  2. What special operations are performed to hash keys in a HashMap? Why do you do that?

Think about yourself first, and then read better!

The following analysis is all about JDK 1.8

Analysis of the

Question 1: How do I find the smallest power of 2 integer that is larger than the initial capacity value set?

When we use a HashMap, if we use the default constructor, we build a HashMap with an initial capacity of 16 and a load factor of 0.75. This has a disadvantage, is when the amount of data is relatively large, there will be frequent expansion operations, expansion will occur data shift, in order to avoid expansion, improve performance, we used to estimate the capacity, and then through the capacity constructor to create, look at the source code

public HashMap(int initialCapacity, float loadFactor) {...// If the initial capacity is greater than the maximum capacity, the default maximum capacity is 2^30
    if(initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; .this.loadFactor = loadFactor;
    The // tableSizeFor method calculates the smallest integer to the power of 2 that is greater than the specified initial capacity value
    this.threshold = tableSizeFor(initialCapacity);
}
Copy the code

The maximum capacity is 2^30, that is, the length range of the HashMap array part is [0,2^30], and then calculate the smallest integer power of 2 larger than the initial capacity. TableSizeFor method is the key, we look at the source code

// Returns a power of two size for the given target capacity
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

This method is very clever, because the HashMap ensures that the capacity is an integer power of 2. The effect of this method is that if you enter cap itself as an integer power of 2, then return cap itself. If the cap is not an integer power of 2, Returns the smallest integer power of 2 greater than cap

Why is the capacity an integer power of two?

For example, TAB [I = (n-1) &hash] [I = (n-1) &hash]

  1. N is an integer power of 2, so that n-1 is followed by all 1 bits, which ensures that the corresponding digit after (n-1) & hash can be either 0 or 1, depending on the value of the hash. This ensures that the hash is uniform and efficient
  2. If n is not an integer power of 2, there will be more hash collisions

This method performs the cap-1 operation first, which has the advantage of avoiding the situation where the input cap is an integer power of 2 and the final calculation is twice cap, because cap is set to satisfy the requirements of the HashMap. There is no need to initialize a HashMap of 2x capacity

We have already shown that the maximum size of a HashMap is 2^30, so the maximum size of a HashMap is a 30-bit integer. Let’s use a 30-bit number to demonstrate the shift or operation in the algorithm. Suppose n = 001xxx XXXXXXXX XXXXXXXX XXXXXXXX (x stands for 0 or 1, we don’t care)

Moves to the right the first time n | = n > > > 1, the operation is to use n itself and n moves to the right or one of several for operation, so that we can achieve the goal of the highest n 1 is adjacent to the right one is set to 1

N 001 XXXXXXXX XXX XXXXXXXX XXXXXXXX n > > > 1, 0001 xx XXXXXXXX XXXXXXXX XXXXXXXX | or operating 0011 xx XXXXXXXX XXXXXXXX XXXXXXXX The result is that the first digit immediately to the right of the highest digit of n is 1, so that there are two consecutive digits of 1 in the highest digitCopy the code

Moves to the right for the second time n | = n > > > 2

N 0011 xx XXXXXXXX XXXXXXXX XXXXXXXX n > > > 2 000011 XXXXXXXX XXXXXXXX XXXXXXXX | or operating 001111 XXXXXXXX XXXXXXXX XXXXXXXX The result is that there are four consecutive ones in the higher order of nCopy the code

Moves to the right for the third time n | = n > > > 4

N 001111 XXXXXXXX XXXXXXXX XXXXXXXX n > > > 4, 000000, 1111 XXXX XXXXXXXX XXXXXXXX | or operating 001111 1111 XXXX XXXXXXXX XXXXXXXX The result is that there are eight consecutive ones in the higher order of nCopy the code

Moves to the right for the fourth time n | = n > > > 8

N 001111 1111 XXXX XXXXXXXX XXXXXXXX n > > > 8, 000000, 00001111, 1111 XXXX XXXXXXXX | or operation, 001111, 11111111, 1111 XXXX XXXXXXXX The result is that there are 16 consecutive ones in the higher order of nCopy the code

The fifth time moves to the right n | n > > > 16

N 001111, 11111111, 1111 XXXX XXXXXXXX n > > > 16, 000000, 00000000, 00001111 11111111 001111 11111111 11111111 11111111 | or operating The result is that the higher order 1 of n is always 1 after 1Copy the code

And then we compare n to the Max, if it’s greater than or equal to 2^30, we take the Max, if it’s less than 2^30, we do the +1 operation on n, because all the digits are 1, so +1 is the same thing as finding the smallest power of 2 greater than that number

011111 11111111 11111111 11111111, this value is the smallest integer power of 2 greater than the given value

So let’s use a specific example, cap = 18

So what we put in is 18, and what we get out is 32, which is exactly the smallest power of 2 greater than 18

If cap itself is an integer power of 2, what is the output?

It can be seen from the demonstration that cap itself is an integer power of 2 and the output is itself

For cap-1, I explained that in order to avoid an even output, the final result was 2*cap, which wasted space. Watch the following demonstration

Through the demonstration, we can see that the input is 16, but the final calculation result is 32, which will waste space. Therefore, the algorithm is very good, and cap is subtracted by one first

Question 2: What special operations are performed to hash keys in a HashMap? Why do you do that?

First, we know that when HashMap does a put operation, it first hashes the key to locate the source

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

(h = key.hashcode ()) ^ (h >>> 16)

The originalhashCode value: 10110101 01001100 10010101 11011111 00000000 00000000 10110101 01001100 XOR value: 10110101 01001100 00100000 10010011Copy the code

This operation moves the hashCode value of the key and the hashCode value 16 bits to the right to xor (different is 1, same is 0), which mixes the high and low hash values together, resulting in a more discrete hash

The size of the array is very small. For example, the default size is 16. Suppose three different keys generate a hashCoe value like this:

19305951 00000001 00100110 10010101 11011111

128357855 00000111 10100110 10010101 11011111

38367 00000000 00000000 10010101 11011111

The three of them have in common that the lower 16 bits are exactly the same, but the higher 16 bits are different. When calculating their subscript in the array, use (n-1)&hash, where n is 16, n-1=15, and the binary representation of 15 is

00000000 00000000 00000000 00001111

With 19305951, 128357855, 38367 all with 15 & operation, the results are as follows

When we do the calculation, we find that they all have the same result, which means that they will be placed in a linked list or red-black tree with the same subscript, which is obviously not what we expected

So perform xor on the hash and the value moved 16 bits to the right, and then perform and on 15 to see if the hash conflicts

It can be seen that xOR operation is performed after moving 16 bits to the right, and then the corresponding array subscript is calculated, and it is divided into different buckets, which solves the hash collision problem. The idea is to mix the high and low levels for calculation, and improve the dispersion

conclusion

In fact, there are many points worth studying in HashMap. After understanding the above two points, you will sigh that the author’s ability to write code is really great. We should use these ideas for reference in our work

Welcome to pay attention to the public number [every day white teeth], get the latest article, we communicate together, common progress!