[Java] HashMap parsing

This is the 23rd day of my participation in the August More Text Challenge

preface

HashMap: Sets or gets the corresponding value based on the key. The key algorithm is hash.

public class Test {
    public static void main(String[] args) {
        Map<String, String> map = new HashMap<>();
        map.put("1"."yyds");
        map.put("2"."tql");
        map.put("3"."xswl");
        map.get("1");
        map.get("2");
        map.get("3"); }}Copy the code

Its data structure thinking process:

Array: just according tokey 做 hashAfter ahashKeyDirectly to a location in the array.
Linked lists: What happens after hashing? So let’s add a linked list. Let’s add the samehashKeyChain the values of. Query performanceO(n).
Red-black tree: Is linked list performance bad after more data? So switch to a red-black tree, query performanceO(logn).

The three basic storage concepts of a collection of hash classes are shown below:

Data structure: array + linked list + red-black tree

The reason why JDK 1.8 adds red-black trees is that if the list is too long, the performance of HashMap will be seriously affected. Red-black trees can be used to quickly add, delete, modify, and query the list, which can effectively solve the problem of slow operation when the list is too long.

Let’s look at the key point of HashMap, JDK1.8.

1) Member variable resolution

// initial capacity, default capacity: 16,1 left 4 bits, 2 ^ 4
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// Maximum capacity: 2 ^ 30
static final int MAXIMUM_CAPACITY = 1 << 30;
// Load factor: 0.75
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// Tree threshold: from list to tree
static final int TREEIFY_THRESHOLD = 8;
// Do not tree the threshold
static final int UNTREEIFY_THRESHOLD = 6;
// Minimum tree capacity
static final int MIN_TREEIFY_CAPACITY = 64;


// Inner class: linked list
static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}
Copy the code

Parameters:

Default load factor: 0.75
Default capacity: 16

A meeting of 12 people requires: 12/0.75 = 16 chairs

Mod 16 is less likely to collide than Mod 12 and does not waste resources

However, if the number of people > (number of chairs x load factor), the capacity will be expanded. Call resize to double the capacity.

2)`Hash`algorithm

The probability of a hash collision depends on how hashCode is computed and how large the space is.

After collision, use zipper method.

hashAlgorithm, source code analysis:

    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }

// For example, there is a hash value for a key
// Original value: 1111 1111 1111 1111 1111 1010 0111 1100
0000 0000 0000 0000 1111 1111 1111 1111
// Xor: 1111 1111 1111 1111 0000 0101 1000 0011
Copy the code

We move 16 bits to the right and then xor. Why do we do that?

The goal is to minimize hash collisions and get to the same location in the array. Let the lower 16 bits contain: low 16 bits and high 16 bits characteristics.

And then it says, well, why don’t we do full digit comparisons?

For the most part, the top 16 bits don’t make much difference.

The comparison operation, probably from the low level, can compare the difference faster.

3) Addressing algorithm

Let’s go straight to put(K key, V value) :

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // Empty, initialized
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // Whether the corresponding key exists
    if ((p = tab[i = (n - 1) & hash]) == null)
        // Create one if it does not exist
        tab[i] = newNode(hash, key, value, null);
    else {
        / /... .
    }
    / /... .
}
Copy the code

How do you calculate that? A:(n - 1) & hash.

nThe default 16:
hash:hash(key), is tokeyforhashAfter the calculation

// For example, hash: 1111 1111 1111 1111 1111 0000 0101 1000 0011 // n-1 = 15 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 // 15&hash: // when converted to base 10, it is 3 0000 0000 0000 0000 0000 0000 0011 // then I = 3, which is the index of the array corresponding to the hash value obtained by the last addressing algorithmCopy the code

To summarize the addressing algorithm:

(n - 1) & hash: Determines a position in the array.

Modulo operations, whose performance is relatively poor, in order to optimize the array addressing process.

(n - 1) & hash: The effect is withhash 对 nI’m going to take the exact same thing, but it’s going to be better than the performance of the operationhash 对 nThe mold is much higher

Why is that?

This is a mathematical problem: the length of the array will always be 2 to the n, as long as it keeps the array length 2 to the n. Hash modulo n -> (n-1) & hash is the same, the latter is higher performance.

4)`hash`Conflict linked list handling

What if hash conflicts? So how do you deal with that?

Take a look at the source code:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // Empty, initialized
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // Whether the corresponding key exists
    if ((p = tab[i = (n - 1) & hash]) == null)
        // Create one if it does not exist
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // Same hash, same key
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        / /... .
        
        // Overwrite the old values
        // The same key is overwriting the value
        if(e ! =null) { // existing mapping for key
            / / oldValue old values
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                // value specifies a new value
                e.value = value;
            afterNodeAccess(e);
            returnoldValue; }}/ /... .
}
Copy the code

Here’s an example:

// There are two operations:
map.put("1"."yyds");
map.put("2"."xswl");


// then "yyds" is oldValue
// then "XSWL" is value
V oldValue = e.value;

// The new value overwrites the old value
Copy the code

If you have a long list hanging somewhere after a lot of hash collisions, it can be particularly painful to traverse.

So, JDK 1.8 optimized this to convert lists to red-black trees at 8: O(logn) time.

The corresponding source code is as follows:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    / /... .
    
    else {
        / /... .
        // Now it is a red black tree
        else if (p instanceof TreeNode)
            // A red-black tree is a balanced binary search tree
            // The operation is complicated when inserting nodes
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 如果 binCount >= 15
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        // Convert to a red-black tree
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}/ /... .
    }
    / /... .
}
Copy the code

treeifyBin(tab, hash);The source code is as follows:

A single list is converted to a bidirectional list
Bidirectional lists are transformed into red-black trees

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        // after executing the do-while, we first convert the one-way list to a two-way list of type TreeNode
        // Then it becomes a red black tree
        if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code

Summary: First, the linked list. If the length of the list exceeds 8, the list is converted into a red-black tree

5) Principle of capacity expansion based on array

The underlying data structure of HashMap is: data + linked list + red-black tree.

HashMapCapacity expansion: double capacity expansion +rehash

After capacity expansion, the previously hashed keys need to be rehashed. Each key-value pair is readdressed to find the location of the new array based on the hash value of the key

Expansion is in accordance with the multiple of 2, the source code is as follows:

Static final int tableSizeFor(int cap) {// Indicates that the target value is greater than or equal to the original value. For example, if the value is binary 1000 and decimal 8, the value is 0. // If you do not subtract 1 from it, you will get the answer 10000, which is 16. // Clearly not the result. If you subtract 1 from the binary to 111, you get the original value of 1000, or 8. int n = cap - 1; // Suppose n is 01xxx... XXX // Move n 1 bit to the right: 001xx... XXX, or: 011XX... XXX // Move n 2 to the right: 00011... XXX, or: 01111... XXX / / at this time already has four 1, in front of the four then moves to the right and position or available eight 1 n | = n > > > 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }Copy the code

Previously conflictedkeyAfter expansion, the new array may be distributed in different locations:

6)`JDK 1.8`A high performance`rehash`algorithm

After JDK 1.8, to improve the performance of the rehash process, it is not simply a matter of modulating the length of a new array with the hash value of the key.

Instead of using bitwise operations, for example:

Case diagram:

When array length is 16:

Array length n = 16, Hash1 = hash(key1) n-1 0000 0000 0000 0000 0000 0000 0000 1111 hash1 1111 1111 1111 0000 1111 0000 0101 # Hash1 after ampersand (&) : & result 0000 0000 0000 0000 0000 0000 0000 0000 0101 = 5 Hash2 = hash(key2) n-1 0000 0000 0000 0000 0000 0000 1111 hash2 1111 1111 1111 1111 0000 1111 0001 0101 & result 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000Copy the code

You can see that the two hashes in case 1 have a hash collision, which can be resolved using a linked list or red-black tree.

When the array length is 32 after expansion:

Array length n = 32, Hash1 = hash(key1) n-1 0000 0000 0000 0000 0000 0001 1111 hash1 1111 1111 1111 0000 1111 0000 0101 & result 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 Hash2 = hash(key2) n-1 0000 0000 0000 0000 0000 0001 1111 hash2 1111 1111 1111 1111 0000 1111 0001 0101 & result 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000Copy the code

In case 2, we found that hash2 changed from index = 5 to index = 21. After each expansion, the hash value changes as follows:

Or stay in the same position (indexThe same)
eitherNew index = old index + old array length oldCapFor example, a chestnut:index(21) = index(5) + oldCap(16)

Summary capacity expansion mechanism:

Array multiplied by 2
Readdress (rehash) :hash & n - 1To determine whether there is an extra 1 bit in the binary result
- If it’s not much, it’s the originalindex
- If it’s too much, then it’sindex + oldCap
In this way, it avoids the rehash process where each hash is modulo to the new array.length. The modulo performance is not high, but the bit performance is high.

preface

1) Member variable resolution

2)Hashalgorithm

3) Addressing algorithm

4)hashConflict linked list handling

5) Principle of capacity expansion based on array

6)JDK 1.8A high performancerehashalgorithm

Related Posts

Multiprocess Programming in Python (part 2)

Netty source code (2) – ChannelPipeline (event model)

This article covers implementation principles and interviews for HashMap

2)`Hash`algorithm

4)`hash`Conflict linked list handling

6)`JDK 1.8`A high performance`rehash`algorithm