preface

Note that this series of articles is based on JDK 1.8

This is the first article in a series of hashMap source code articles that will give you an overview of some of the most important aspects of HashMap. Subsequent articles will go into the implementation details of various parts of the framework.

The content mainly involves:

  • infrastructure
  • A hashmap initialization
  • The hash algorithm
  • The disturbance function

infrastructure

The infrastructure of hashMap involves several parts:

  • An array of
  • The list
  • Red and black tree

Normally, the time complexity of accessing a HashMap element is O(1). This is due to its hash mechanism. Based on the key (Hashcode value) of the element, the hash function can be mapped to a specific index position in the table. The hash function is implemented as the hash algorithm.

Is it possible for two elements to map to the same index location? There is such a thing as a hash conflict. Conflicting elements spawn linked lists at the same index location.

When a large number of elements have hash conflicts, the access time complexity of the HashMap becomes O(n), which requires traversing the linked list. To improve efficiency, JDK 1.8 turns lists into red-black trees with O(logn) traversal time once the list length reaches a set threshold.

Initialization process

Let’s start with a few important member variables in the HashMap.

    // The smallest data storage unit in a Hashmap is Node, which is figuratively called a hash bucket
    // Table is a hash bucket array, as the underlying implementation of the container
    transient Node<K,V>[] table;
    // The number of elements that the hash bucket can currently store is based on
    int threshold;
    // loadFactor, threshold = table.capacity * loadFactor, hashmap does not wait until the array is full
    final float loadFactor;
Copy the code
    // The default hash bucket array capacity when initialized
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
    // Default load factor
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;
Copy the code

Threshold = 10 * 0.75 = 7 threshold = 10 * 0.75 = 7 If so, it’s too young too simple.

    public static void main(String[] args) throws Exception {
        HashMap<String, String> hashMap = new HashMap(10);
        Field thresholdField = HashMap.class.getDeclaredField("threshold");
        Field tableField = HashMap.class.getDeclaredField("table");
        thresholdField.setAccessible(true);
        tableField.setAccessible(true);
        int threshold = (int)thresholdField.get(hashMap);
        System.out.println("Threshold value after initialization:" + threshold);
    }
Copy the code

Take a look at the results. Do you have any idea?

After the initialization, the threshold value is 16Copy the code

Initialize related code:

    public HashMap(int initialCapacity) {
        // initialCapacity initialCapacity that is passed in. This value is 10
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
Copy the code

TableSizeFor (initialCapacity), which forces the initialization cap to a power of 2.

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

So you’re wondering, why is there no load factor here, and then you see the code below

        HashMap<String, String> hashMap = new HashMap(10);
        Field thresholdField = HashMap.class.getDeclaredField("threshold");
        Field tableField = HashMap.class.getDeclaredField("table");
        thresholdField.setAccessible(true);
        tableField.setAccessible(true);
        int threshold = (int)thresholdField.get(hashMap);
        System.out.println("Threshold value after initialization:" + threshold);
        hashMap.put("1"."a");
        threshold = (int)thresholdField.get(hashMap);
        System.out.println("Threshold value after adding elements:" + threshold);
Copy the code

Running results:

Threshold after initialization: 16 Threshold after adding elements: 12Copy the code

In fact, the HashMap uses a mechanism similar to “lazy loading” in that the actual initialization is triggered only when the element is added for the first time, and the value of threshold is recalculated.

The hash algorithm

If you look at the hash() method, which recalculates the hash value based on the key’s hashcode value, you can see that the hash value corresponding to null is 0.

    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

Can I get this hash value and map it directly to an array index? Not yet. Using hash mapping directly requires a very large array size (hash 32 is an integer), so you need to modulo the array length.

Is hashMap directly modular? See the code below.

// n is the length of the array, and I is the calculated array index
i = (n - 1) & hash
Copy the code

As mentioned above, the initialization of the HashMap forces the array to be a power of 2, including the subsequent expansion, which requires that the array be a power of 2. This is a very tricky operation. Since (n-1) &hash is the same as hash % n when array length n is a power of 2, and &is directly supported by the underlying hardware, it is more efficient than modular arithmetic.

The disturbance function

Of course, a good hash algorithm can greatly reduce the probability of hash conflicts. Going back to the hash calculation, why move hashCode 16 bits to the right and xor with the original value?

// Perturbation function code
(h = key.hashCode()) ^ (h >>> 16)
Copy the code

Let’s take a look at the effect of this algorithm. It is equivalent to mixing the high and low bits of HashCode, making the low bits more obvious while retaining the high bits.

In combination with the hash mapping method I = (n-1) & hash mentioned above, (n-1) is equivalent to low-order mask. If the low-order feature of hash value is obvious enough, the probability of conflict in mapping result I will be smaller, which is the value of perturbation function.

conclusion

This paper mainly takes you to solve two core problems:

  • Why is the hashMap initialization capacity always a whole power of 2?
  • Implementation of hash algorithm

If you find it helpful, please leave a comment and like it.