JDK source code analysis of HashMap

Language finch interview information sharing, welcome to come to contribute.

The underlying HashMap is made up of arrays and linked lists, using the chained address method, but the implementation is slightly different in jdk1.7 and 1.8.

JDK 1.7

Data structure diagram in 1.7: Let’s take a look at the implementation in 1.7.These are the member variables that compare the core of the HashMap; What do they mean?

Initialize the bucket size to 16, which is the default size of the array since the underlying is an array.
The maximum value of a bucket is 1<<30.
Default load factor (0.75)
Table is the array that actually holds data.
Size of the Map storage quantity.
Bucket size, which can be specified explicitly at initialization.
Load factor, which can be specified explicitly at initialization.

The initialization value

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)(capacity * loadFactor);
        table = new Entry[capacity];
        init();
    }
Copy the code

The initialization value initialCapacity passed in is automatically converted to the nearest value & greater than 2 to the power of N. JDK1.7 uses a loop, while JDk1. uses the tableSizeFor method instead of a loop, which is also clever.

The load factor

Given a default capacity of 16 with a load factor of 0.75. When Map is used, data is continuously stored in the Map. When the amount of data reaches 16 x 0.75 = 12, the current capacity of 16 needs to be expanded to double the current capacity. Capacity expansion involves operations such as rehash and data replication, which consume high performance. Therefore, it is recommended to estimate the size of HashMap in advance to minimize performance loss caused by capacity expansion.

As you can see from the code, what’s really storing the data istransient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;This array, how is it defined?Entry is an inner class in a HashMap, as can be easily seen from its member variables:

Key is the key to write to.
Value is value.
From the beginning, we mentioned that HashMap is made up of arrays and linked lists, so this next is used to implement the linked list structure.
Hash stores the hashcode of the current key.

Knowing the basic structure, let’s take a look at the important write and fetch functions.

Put method

The code comes from github.com/openjdk/jdk…

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}

/** * Returns index for hash code h. */
static int indexFor(int h, int length) {
    return h & (length-1);
}

void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    if (size++ >= threshold)
        resize(2 * table.length);
}

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}
Copy the code

Determines whether the current array needs to be initialized.
If key is null, put a null value into it.
Calculate the Hashcode based on the key(h = key.hashCode()) ^ (h >>> 16) 。
Locate the bucket based on the calculated HashCode.
If the bucket is a linked list, we need to traverse to determine whether the hashcode and key in the bucket are equal to the incoming key. If they are equal, we overwrite them and return the original value.
If the bucket is empty, no data is stored in the current location. Add an Entry object to the current location.
When calling addEntry to write Entry, you need to determine whether to expand the capacity. Double the size if necessary, and rehash the current key and position it. And in thecreateEntryWill pass the current position of the bucket to the new bucket, if the current bucket has a value will form a linked list in the position.

The get method

The code comes from github.com/openjdk/jdk…

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);
    return null == entry ? null : entry.getValue();
}
final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }
    int hash = (key == null)?0 : hash(key);
    for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
         e = e.next) {
        Object k;
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
            return e;
    }
    return null;
}
Copy the code

First of all, according tokeyTo calculate thehashcodeAnd then locate it in the specific bucket.
Determines whether the location is a linked list.
It’s not a linked listkey 的 hashcodeIs equal to return the value.
For linked lists, you need to iterate untilkey 及 hashcodeIt returns the value when it is equal.
I get nothing and I go straight backnull 。

Should hashCode and equals be overridden together?

The hashCode() method of the key is calculated to determine which index of the array the key resides in. Then the equals() method is called to determine whether it is the same key and whether to replace the value. So they have to be overridden together to make sure that the array subscripts in the HashMap are the same; if they are different, equals() is never called at all.

JDK 1.8

Jdk1.8’s implementation is largely the same as 1.7. But the drawbacks of JDK1.7 are:

When Hash conflicts are serious, the linked list formed on the bucket becomes longer and longer, resulting in lower query efficiency. Time complexity is O(N).

So the implementation of 1.8 is: when the list length exceeds8, will becomeRed and black tree. Structure:Let’s start with a few core member variables:

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two < = 1 < < 30. * /
static final int MAXIMUM_CAPACITY = 1 << 30;
/** * The load factor used when none specified in constructor. */
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
static final int TREEIFY_THRESHOLD = 8;
transient Node<K,V>[] table;
/** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */
transient Set<Map.Entry<K,V>> entrySet;
/** * The number of key-value mappings contained in this map. */
transient int size;
Copy the code

It’s basically the same as 1.7, with a few important differences:

TREEIFY_THRESHOLD=8A threshold used to determine whether a linked list needs to be converted to a red-black tree.
Change HashEntry to Node.

In fact, the core composition of Node is the same as HashEntry in 1.7, which stores key value, Hashcode next and other data. Now look at the core method.

Put method

The source codeGithub.com/openjdk/jdk… It seems to be more complicated than 1.7, so we break it down step by step:

Check whether the current bucket is empty. If the bucket is empty, initialize it (resize will determine whether to initialize it).
Locate the bucket according to the hashcode of the current key and check whether it is empty. If it is empty, it indicates that there is no Hash conflict. Create a new bucket at the current location.
If the current bucket has a value (Hash conflict), compare the values in the current bucketkey 的 hashcodeWith the writtenkeyIf it is equal, it is assigned toeAt step 8, the assignment and return are unified.
If the current bucket is a red-black tree, write data in red-black tree fashion.
If it is a linked list, encapsulate the current key and value into a new node and write it to the end of the bucket (form a linked list).
Then determine whether the size of the current list is greater than the preset threshold, and if so, convert to a red-black tree.
If the same key is found during the traversal, exit the traversal directly.
ife ! = nullIt’s equivalent to having the same key, so you need to override the value.
Determine whether capacity expansion is required.

The get method

Source: github.com/openjdk/jdk…

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        if((e = first.next) ! =null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}
Copy the code

The GET method looks a lot simpler.

The key is first hash and the bucket is retrieved.
If the bucket is empty, null is returned.
Otherwise, it checks whether the key in the first position of the bucket (such as a linked list or red-black tree) is the query key and returns value directly.
If the first one does not match, it is determined whether the next one is a red-black tree or a linked list.
The red-black tree returns the value as the tree looks up.
Otherwise, the match returns are iterated through in a linked list.

From these two core methods (get/ PUT), we can see that the large linked list is optimized in 1.8, and the query efficiency is directly improved to O(logn) after the modification to red-black tree.

Concurrency is unsafe

The original problems with HashMap also exist.

public static void main(String[] args) throws InterruptedException {
        int N = 100000;
        final HashMap<String, String> map = new HashMap<String, String>();
        //final HashMap
      
        map = new HashMap
       
        (N); // Specify the capacity to avoid expansion
       ,>
      ,>
        Thread[] ts = new Thread[N];
        for (int i = 0; i < N; i++) {
            ts[i] = new Thread(new Runnable() {
                @Override
                public void run(a) {
                    map.put(UUID.randomUUID().toString(), ""); }}); }for (int i = 0; i < ts.length; i++) {
            ts[i].start();
        }
        for (Thread t : ts) {
            t.join();
        }
        System.out.println("end");
    }
Copy the code

If the above code is executed several times, an infinite loop will occur. If theJDK1.8And above,

An endless loop can occur
It could happenjava.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode

Most bloggers only mention the endless loop, not the type strong anomaly

How does the loop happen

It was caused by the expansion.

Source code analysis

The process of the PUT method is shown above

Check whether the key already exists.keyThere is, replace. Does not exist, insert new element
After inserting the new element, check(size++ >= threshold)It’s going to expand, it’s going to double the capacity.
Expansion callresize()Methods. Here, a larger array will be created, and elements will be moved through the transfer method. The logic of movement is also very clear. The linked list of each position in the original table will be traversed, and each element will be re-hash, and the destination will be found in the new newTable and inserted.

Resize () method source code

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) { // This is a loop, and an infinite loop can occur.
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}

/** * Returns index for hash code h. */
static int indexFor(int h, int length) {
    return h & (length-1);
}

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable);
    table = newTable;
    threshold = (int)(newCapacity * loadFactor);
}

/** * Transfers all entries from current table to newTable. */
void transfer(Entry[] newTable) {
    Entry[] src = table;
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j++) {
        Entry<K,V> e = src[j];
        if(e ! =null) {
            src[j] = null;
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            } while(e ! =null); }}Copy the code

Case analysis

Let’s say the HashMap has an initial size of 4, insert three nodes, and unfortunately, all three nodes hash to the same location. If the default load factor is used, insert a third node and expand it. (For verification purposes, let’s say the load factor is1). Let’s say I have two threads,Thread 1andThread 2While inserting a third node,resize()The method is executed simultaneously, and both threads create a new array.Assuming thatThread 2In carrying out theEntry<K,V> next = e.next;After that, the CPU time slice runs outePoint to the nodea, the variablenextPoint to the nodeb 。

Thread 1Go ahead. Unfortunately,a 、 b 、 cnoderehash(that is, the callindexForMethod) and then again in the same position7First step, move node AStep 2: Move node BStep 3: Move node CAt this timeThread 1The internal table has not been set as a new newTable.Thread 2The internal reference relationship is as follows:

At this time, inThread 2The variableePoint to the nodea, the variablenextPoint to the nodebTo start executing the remaining logic of the loop body.After execution, the variable e points to node B, since e is not null, the loop body continues, and the reference relationship is executedvariableePoint back to the node againa, can only continue to execute the loop body, here is a careful analysis: 1Entry<K,V> next = e.next;, current nodeaThere is nonext, so the variablenextPoint to thenull; 2,e.next = newTable[i];Among themnewTable[i]Point to the nodebThat is thea 的 nextIt points to the nodebSo that thea 和 bThey refer to each other and form a loop; 3,newTable[i] = ePut node A in array I; 4,e = next;The variableeThe assignment fornullBecause the variables in step 1nextIs the point tonull; So the final reference relationship looks like this:

Nodes A and B refer to each other to form a loop, and an infinite loop will occur whenever the list of rings below the array is traversed.

In JDK1.7, you can set the initial capacity to avoid expansion and avoid an infinite loop. However, JDK1.8 avoids the occurrence of an infinite loop, but it does not prevent the following class type strong exception from occurring

How do strong type exceptions occur

JDK1.8 source code:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

Two threads 1 and 2 are inserted simultaneously.

Thread 1 inserts element 2 and line 14 is suspended by CPU
Thread 2 has inserted itself into the eighth element, turning the list into a red-black tree.
Thread 1 is awakened. Type coercion failed.

Reference:

Crossoverjie. Top / 2018/07/23 /… (archive)

Juejin. Cn/post / 684490… (archive)

JDK 1.7

The initialization value

The load factor

Put method

The get method

Should hashCode and equals be overridden together?

JDK 1.8

Put method

The get method

Concurrency is unsafe

How does the loop happen

Source code analysis

Case analysis

How do strong type exceptions occur

Related Posts

Autonavi Rendering Gateway Go language reconstruction practice

COVID-19 Prediction with Python

EDI application in electronic logistics