HashMap source code analysis

HashMap

The underlying HashMap is based on arrays and linked lists, although the implementation is slightly different in jdk1.7 and 1.8.

The Base 1.7

Implementation in Java7.

Initialize the bucket size. This is the default size of the array.

② Maximum number of buckets.

③ Default load factor (0.75)

(4)tableAn array that actually holds data.

5.MapThe size of the storage quantity.

⑥ Threshold, which can be explicitly specified during initialization.

⑦ Load factor can be specified explicitly during initialization.

Given a default capacity of 16 with a load factor of 0.75.

When Map is used, data is continuously stored in it. When the amount of data reaches 16 x 0.75 = 12, the current capacity of 16 needs to be expanded. This process involves rehash and data replication, which consumes high performance.

Estimate the size of the HashMap in advance to reduce performance loss caused by capacity expansion.

The real data store is

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

Entry is an inner class in a HashMap, as can be easily seen from its member variables:

1.keyThat’s the key when you write.

2.valueNature is value.

3.nextFrom the beginning, we mentioned that HashMap is made up of arrays and linked lists, so this next is used to implement the linked list structure.

(4)hashIt stores the hashcode of the current key.

Java7 Hash conflicts

When Hash conflicts are serious, the linked list formed on the bucket becomes longer and longer, resulting in lower query efficiency. Time complexity isO(N).

The Base 1.8

1.8 focuses on optimizing this query efficiency.

1.8 HashMap Structure:

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; // Maximum capacity static finalfloatDEFAULT_LOAD_FACTOR = 0.75 f; Static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; Static final int MIN_TREEIFY_CAPACITY = 64; static final int MIN_TREEIFY_CAPACITY = 64; // Entry[] Size of array >= 64, transient Node<K,V>[] table; // Array trunk TRANSIENT Set< map. Entry<K,V>> entrySet; // transient int size; // The number of valid keys in the mapCopy the code

Java8 HashMap differences and improvements

TREEIFY_THRESHOLD Indicates the threshold to determine whether the linked list needs to be converted to a red-black tree.
The core composition of Node is actually the same as that of HashEntry in 1.7, which stores key value, Hashcode next and other data.
(JDK 8 uses head and tail to keep lists in the same order; JDK 7 Rehash inverts linked list elements), but also has drawbacks such as data loss (concurrency itself). Therefore, ConcurrentHashMap is recommended for multithreading

Concurrency security issues with HashMap

There are two main things that can go wrong with HashMap concurrency:

JDK 7 Rehash uses header interpolation, which inverts list elements, causing two threads to create a circular list during expansion
If multiple threads add elements using the put method at the same time, and assuming that there are exactly two put keys that collide (the same bucket as the hash value), the implementation of the HashMap adds both keys to the same location in the array. This will eventually cause one of the threads put data to be overwritten
If multiple threads simultaneously detect that the number of elements exceeds the array size * loadFactor, then multiple threads will simultaneously expand the Node array, recalculating the element position and copying the data, but only one thread will eventually assign the expanded array to the table, which means that the other threads will lose the array. And the data put by the respective threads is also lost

The hash method

Why should there be a hash() method for a HashMap? Can’t we just use the original hash value of K in KV? Why cannot the original hash value of K be used in the put and get operations of HashMap?

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

The above code shows how the hash value of the key is computed. The hash value of the key is 16 bits higher than that of the key. The lower 16 bits are the same as the higher 16 bits or the final hash value of the key. (h >>> 16: indicates an unsigned 16-bit shift to the right, and a high value of 0 is added. Any number that resembles 0 or is itself. Therefore, the hash value of the key remains the same 16 bits high.)

Why do you do that? This is related to the calculation of table subscripts in the HashMap.

n = table.length;
index = （n-1） & hash;
Copy the code

Since the length of a table is a power of two, the index is only associated with the lowest n bits of the hash value, and the highest bits of the hash value are set to 0 by the and operation. Assume that the table. Length = 2 ^ 4 = 16.

As you can see from the figure above, only the lower four bits of the hash value are involved. It’s easy to crash. The designers weighed speed, Utility, and quality to reduce the impact of the high 16 bits with the low 16 bits. The designers took into account that hashCode is pretty well distributed now, and also uses tree storage to reduce collisions when large collisions occur. Only xor, not only reduces the overhead of the system, but also will not cause the collision caused by the high level does not participate in the calculation of the subscript (table length is small).

TableSizeFor method

Source:

    static final int MAXIMUM_CAPACITY = 1 << 30;
    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

Where this method is called:

    public HashMap(int initialCapacity, floatLoadFactor) {/** omit this code **/ this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }Copy the code

It can be seen that when instantiating a HashMap instance, if initialCapacity is given, since capacity of a HashMap is a power of 2, So this method is used to find the smallest power of 2 greater than or equal to initialCapacity (if initialCapacity is a power of 2, it will return the same number). Here is an analysis of the algorithm: First, why do we subtract 1 from cap? int n = cap – 1; This is in case cap is already a power of two. If cap is already a power of 2 and the subtraction operation is not performed, the capacity returned after the next few unsigned right-shift operations will be twice the capacity of the cap. If you don’t understand, look at the next few unsigned right shifts and come back. Let’s look at the unsigned right shift operations: if n is now 0 (after cap-1), the unsigned right shift is still 0 after several subsequent operations, and the capacity returned is 1 (with an n+1 operation at the end). We’re only dealing with cases where n is not equal to 0. First shift to the right

n |= n >>> 1;
Copy the code

Since n is not equal to 0, the binary representation of n always has a 1, so consider the highest 1. By moving 1 bit to the right unsigned, the highest bit of 1 is moved 1 bit to the right, and then do or so that the right bit immediately adjacent to the highest bit of 1 in the binary representation of N is also 1, for example, 000011xxxxxx. Second shift to the right

n |= n >>> 2;
Copy the code

Note that the n have been n | = n > > > 1; Operation. Assuming that n is 000011XXxxxx, the unsigned right shift of n shifts the two most consecutive 1’s to the right by two, and then does or with the original N, so that there are four consecutive 1’s in the high level of the binary representation of N. Such as 00001111 XXXXXX. Third shift to the right

n |= n >>> 4;
Copy the code

This time, move the four consecutive ones in the existing high order four to the right, and do the or operation again, so that the binary representation of n will have eight consecutive ones in the high order. Such as 00001111 1111 XXXXXX. So on notice that the highest capacity is 32 bit of positive, so the last n | = n > > > 16; , the maximum number of 1’s is 32, but the value is greater than MAXIMUM_CAPACITY. Let me give you an example.

This algorithm is awesome!

Note that the resulting capacity is assigned to threshold.

this.threshold = tableSizeFor(initialCapacity);
Copy the code

At first I thought this was a Bug, but I thought I should write this:

this.threshold = tableSizeFor(initialCapacity) * this.loadFactor;
Copy the code

The HashMap will be expanded when the size of the HashMap reaches threshold. Note, however, that the table member variable is not initialized in the constructor, and the initialization of the table is deferred to the PUT method, where threshold is recalculated.

Put method

The JDK 1.7 implementation

Public V put(K key, V value) {// If the key is null, a NullKey is insertedif (key == null)
           returnputForNullKey(value); // Get keyhashValue of the inthash = hash(key); Int I = indexFor(hash, table.length); // Iterate through the list elements under this index nodefor(Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k; //hashCollision overwrites the old valueif (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
               V oldValue = e.value;
               e.value = value;
               e.recordAccess(this);
               returnoldValue; } } modCount++; // Change times +1 addEntry(hash, key, value, i); // Insert a new element into the head of the array index nodereturn null;
}
void addEntry(int hash, K key, V value, int bucketIndex) {// If the size of the HashMap trunk array has been >= set threshold, and the header element is not Nullif((size >= threshold) && (null ! = table[bucketIndex])) {resize(2 * table.length); // Get the key to inserthashvaluehash= (null ! = key) ?hash(key) : 0; BucketIndex = indexFor(bucketIndex = indexFor(hash, table.length); } // Perform the operation to add the element createEntry(hash, key, value, bucketIndex);
}
void createEntry(int hash, K key, V value, int bucketIndex) {// The header of the list Entry<K,V> e = table[bucketIndex]; Table [bucketIndex] = new Entry<>(hash, key, value, e); //map element +1 size++; } static inthash(int h) {
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
}     
static int indexFor(int h, int length) {
        return h & (length-1);
}
Copy the code

The JDK 1.8 implementation

Public V put(K key, V value) {// For the keyhashCode () to dohash
      return putVal(hash(key), key, value, false.true);
  }
  
  final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // Step 1: If TAB is empty, create oneif((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // Step 2: calculate index and handle nullif ((p = tab[i = (n - 1) & hash]) == null) 
         tab[i] = newNode(hash, key, value, null);
     else{ Node<K,V> e; K k; // Step 3: The node key exists, overwriting the value directlyif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // Step 4: Determine that the chain is a red-black treeelse if (p instanceof TreeNode)
             e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // Step 5: The chain is a linked listelse {
             for (int binCount = 0; ; ++binCount) {
                 if ((e = p.next) == null) {
                     p.next = newNode(hash, key,value,null); // List length greater than 8 is converted to red black tree for processingif (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st  
                         treeifyBin(tab, hash);
                     break; } // Key already exists overwriting value directlyif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }}if(e ! = null) { // existing mappingfor key
             V oldValue = e.value;
             if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);returnoldValue; } } ++modCount; // Step 6: Expand the capacity when the capacity exceeds the maximumif (++size > threshold)
         resize();
     afterNodeInsertion(evict);
     return null;
}
Copy the code

The get method

The JDK 1.7 implementation

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);
        return null == entry ? null : entry.getValue();
}
final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }
        int hash= (key == null) ? Zero:hash(key); /** * locate the element in the array, then iterate through the list at that element * check the condition is keyhashValue is the same, and the linked list stores the same key as the one passed */for (Entry<K,V> e = table[indexFor(hash, table.length)]; e ! = null; e = e.next) { Object k;if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
        }
        return null;
}
Copy the code

The JDK 1.8 implementation

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; // tab.length K k; // // table is not empty, the length of the table is greater than 0, and the header element of the target indexed list is not emptyif( (tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash])! Always check first node */ always check first node */if (first.hash == hash&& ((k = first.key) == key || (key ! = null && key.equals(k))))returnfirst; // If so, return the element directly // check that the next node element of the list header element is not NULLif((e = first.next) ! = null) {// Then check whether the bucket of this array is a red-black tree or a red-black treeif(first instanceof TreeNode) // Then start looking for the node in the tree and return to nodereturn ((TreeNode<K,V>)first).getTreeNode(hash, key); // If it is not a red-black tree, it must be a linked listdo{// then I found one in the listhash// Return the element with the same keyif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
}
Copy the code

HashMap expansion

The JDK 1.7 implementation

Void resize(int newCapacity) {// Old array Entry[] oldTable = table; Int oldCapacity = oldtable.length; // If the size of the array before expansion has reached the maximum value (2^30), set the threshold to the maximum value and exit.if (oldCapacity == MAXIMUM_CAPACITY) {  
        threshold = Integer.MAX_VALUE;  
        return; } // Initialize a new Entry array Entry[] newTable = new Entry[newCapacity]; Transfer (newTable, initHashSeedAsNeeded(newCapacity)); // The table attribute of the HashMap refers to the new Entry array table = newTable; // Set threshold = (int) math. min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); // Set threshold = (int) math. min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); }Copy the code

When you delete data

The JDK 1.7 implementation

Public V remove(Object key) {Entry<K,V> e = removeEntryForKey(key);return(e == null ? null : e.value); } // Delete the Entry according to the specified key and return the corresponding value final Entry<K,V> removeEntryForKey(Object key) {if (size == 0) {  
        return null;  
    }  
    int hash= (key == null) ? Zero:hash(key);  
    int i = indexFor(hash, table.length);  
    Entry<K,V> prev = table[i];  
    Entry<K,V> e = prev;  
    while(e ! = null) { Entry<K,V> next = e.next; Object k;if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k)))) { modCount++; +1 size--; // The number of elements is -1ifTable [I] = next; (prev == e) // Insert a reference to next directly into table[I]elseprev.next = next; // Otherwise set next in the previous Entry of the current Entry in table[I] to next e.recordremoVal (this) of the current Entry;return e;  
        }  
        prev = e;  
        e = next;  
    }  
    return e;  
}
Copy the code

A quick summary of hashMaps

Either 1.7 or 1.8 can see that the JDK is not doing any synchronization on it, causing concurrency problems and even an infinite loop that makes the system unusable, so the JDK has created a special ConcurrentHashMap. This class is located under the java.util.concurrent package and is specifically used to solve concurrency problems.

References:

Mp.weixin.qq.com/s/fZRPogkkU… Pure smile
www.jianshu.com/p/619a8efcf… Jack is handsome
www.cnblogs.com/liujinhong/…