preface

In a previous article I looked at the problem of initializing the capacity of a HashMap.

Through this article We know when the HashMap is set size, size and capacity of threshold is how to calculate, but some friend Including I might be in a little bit curious Why is the default size is 16 and computing their capacity, finally calculated the capacity and power to the power of 2? Some of you probably know that this is to reduce the hash collision rate, but why? Let’s talk about it today

Analysis of the

Capacity calculation

    /** * Returns a power of two size for the given target capacity. */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

For those of you who haven’t read the previous article, this method is used to calculate the initial capacity in a Hashmap. The purpose is to get a value greater than the minimum power of 2 of the cap value currently passed in.

After reading this sentence feel good mouthful, is greater than and less than the comparison meng force, nothing then we give 2 chestnuts:

  • Cap :10. The value of this method is 16 which is 2 to the fourth power
  • Cap :17 the value of this method is 32 which is 2 to the fifth power
  • Cap :1000 The value of this method is 1024 which is 2 to the tenth power

I believe that saw this I believe that partners must know ~

Hash position calculation

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }
    
    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {...if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null); . }Copy the code

Above is the code I captured to calculate the location of the HashMap

We know that the calculation of the slot position in the HashMap goes through the following steps:

hash

First we call the hash method, which takes the hascode value of the key and xor the value of the hashcode 16 bits to the right

Why do we do that?

First of all, we know that the logical operation of xor [^] is equal to 0 and different to 1== that is: 0 ^ 0=0, 1 ^ 1=0, 1 ^ 0=1, 0 ^ 1=1

And 【&】 the rule of operation ==2 is 1 when the result is 1 otherwise 0== namely: 0&0=0, 1&1=1, 1&0=0, 0^1=0

Will review, by the way, or the operator “|” rule is only have 1 that the result is 1 = = = = : 0, 0 = 0, 1 and 1 = 1, 1 and 0 = 1, 0 ^ 1 = 1

Remember that the next two logical operator methods are covered

(h = key.hashcode ()) ^ (h >>> 16), h >>> 16 is the operation that moves the binary bits to the right and fills the space with zeros. Given this, let’s do a little demo to see what this does. I’m going to write a very long array of 16 flowers so I’ll shift it 4 bits to the right, for example:

If the binary value of key hashCode is: 1011 0011 then the result of >>> 4 is 0000 1011 then the result of 2 is: 1011 0011 0000 1011 1011 1000 The result is that the value of the top 4 bits does not change, but the value of the bottom 4 bits does change. The purpose of this is to reduce the influence of the top 4 bits to the bottom 4 bits, and ensure that the hash value after the hash is calculated to reduce the conflict of the hash position. This is roughly what the translation means!

(n – 1) & hash

So this is the method that computed the position and the hash value that was passed in from the last method

If this is the hash value of 1011 1000 our n value is 16, then even the result will be hash: 1011 1000 n-1:0000 1111 result: 0000 1000 10 base value: 8

If the hash value is changed to 1011 1100 then the result will look like this: 1011 1100 N-1:0000 1111 result: 0000 1100 10 base value: 12

At this point, we can see that the index is different depending on the hash value, which ensures that the hash value is evenly distributed

But if this is our n length not a power of two, say 10, then the result would be a hash like this: 1011 1000 n-1:0000 1001 result: 0000 1000 10 base value: 8

At this point we will still change the hash value to 1011 1100 so even if the result will be this hash: 1011 1100 N-1:0000 1001 result: 0000 1000 10 base value: 8

At this point, we see that even though the hash value has been changed, we still get the same index, just like in the example above if the hash value is 1011 1000, 1011 1100, 1011 1110 will always be 0000 and 1000 index will always be 8 which will increase the Hash collision rate, which is obviously not what we want.

Do you see why they’re all powers of two, because if you subtract one from the power of two, the low order of binary is one, This ensures that the Hash value and the Hash value are kept in the same slot so that different Hash values are not allocated to the same slot.

conclusion

Through the above analysis you are clear. Do not know the words to see more think more, the design is so wonderful! A detail is so excellent, worth us to investigate and learn.

In order to reduce hash collisions, the value of a hashMap is always raised to a power of 2, because the binary order of an array after 2 to the power of -1 is all 1. This ensures that the hash value and the hash value after the [&] retain the whole low hash value so that different hash values are not assigned to the same hash card slot!

Total !!!!