The internal structure of a HashMap consists of an array (Node[] table) combined with a linked list/red-black tree. Hash determines the address of key-value pairs in this array; When the number of arrays exceeds (load factor: 0.75 by default) x (capacity: 16 by default), adjust the capacity. Key-value pairs with the same hash value are stored as a linked list. If the size of the linked list exceeds a threshold (threshold, default 8), the list is modified to a tree structure.

Question 1: When calculating a hash, why use the hash() function to get the hash value instead of using hashcode() directly?

The hash used in a HashMap, which comes not from the hashCode() of the key itself, but from another hash() method inside the HashMap.

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }

    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

The hash value obtained by hashCode () is an int in the range of -2147483648 to 2147483648, which is about 4 billion. In most cases, the HashMap array is not that long. So we need to hash (length-1) & hash (using bit operation instead of modulo operation to improve the efficiency of the program) to compute the array subscript.

If you just take the lower hash, then those hash values with the same lower hash but different higher hash values will collide. The HashMap perturbs the hash value, moves the high 16 bits to the right and xor (^) with the original hash value, mixing the high 16 bits with the low 16 bits to produce a more hashed 16-bit hash value.

Question 2: Why is the length of the array in the HashMap always an integer power of 2 when expanding by twice?

Hash the position in the array using (length-1) &hash. And any integer power of two, you subtract one and you get all one bits. For example, 16-1=15, the binary representation is 1111; 32-1=31, binary representation: 11111. So the Hash value and n-1 of (&) will yield the low Hash value, which will be hashed enough to reduce the probability of Hash collisions.

For example, if the size of HashMap is 16, its binary value is 10000, and the binary value of (n-1) is 01111, and the hash value is calculated as follows:

01111&01001 = 01001 01111&01101 = 01101 01111&01100 = 01100 01111&01111 = 01111Copy the code

But when the size of HashMap is not 2 to the NTH power, and the size is 10, the binary is 01010, and the binary of (n-1) is 01001, add the same element to it, and the result is as follows:

01001&01001&01101 = 01001 01001&01100 = 01000 01001&01111 = 01001Copy the code

There were three different elements that went through the ampersand and the result was the same, and it was a serious hash collision.

Question 3: Why do you want to turn a linked list into a red-black tree instead of an AVL tree when you tree it

Why is a HashMap tree-like? Because in the element placement process, if an object hash conflicts, are placed in the same bucket, it will form a linked list, and we know that the linked list query is linear, which will seriously affect the access performance.

AVL tree is a highly balanced binary tree, so the search efficiency is very high. However, there are advantages and disadvantages. In order to maintain such a high balance, AVL tree has to pay more costs. Each insertion and deletion needs to be adjusted, which is complicated and time-consuming. Therefore, AVL trees can be a bit expensive for data sets with frequent insert and delete operations.

Red-black trees are approximately balanced, not strictly balanced, so the cost of maintaining equilibrium is lower than AVL trees. Therefore, the operation performance of red-black tree insertion, deletion and search is relatively stable.