Full depth parsing HashMap

Wechat public number: Orange Song Java technology nest Internet technology exchange & push group: 693961584 article first dig gold platform, the subsequent synchronous update of the public number, after attention to reply to “group” can join the Internet technology exchange & push wechat group, and a group of big factory leaders to discuss interview questions. Reply to “666” to get all the information packages available on the First line Internet (including development software, development specifications, classic e-PDF, and some quality learning courses).

preface

Before, I talked about an ArraList article, ArrayList several big questions, after reading it, please hit me if you don’t understand it. Today we’ll examine another container, HashMap. The reason for this is simple. I’m sure 99% of readers have used HashMap in the interview, and I’m sure everyone knows how to use HashMap, but there’s more to it than just put and get. Inside this problem involves the Java memory model, the thread is not visible, Hash calculation problems, chain table structure, binary &, |, < <, > > and a series of problems, so the interview often asked to look at a person’s technology is not solid.

The article is longer, please read patiently, there will be a harvest.

The data structure and principles of HashMap

A HashMap is a data structure composed of arrays and linked lists.

Each place in the array has an instance of key-value, which is called Entry in Java 7 and Node in Java 8. Since all of its positions are null, the index value is hashed during put insertion. For example, if I put (“orange”, “orange”), AND I insert the element orange, we will use the hash function to calculate the insertion position, assuming that index is 1, we will insert the following:

Why does a HashMap need a list, and what does a list look like

The length of the array is finite, and within a finite length we use hash, and hash itself is probabilistic, so orange and orang we both hash and there is a certain probability that they will be the same (hash collision), in which case we need a linked list, we can put the same data in the same index.

As shown in the Node source code, each Node holds its own Hash, key, value, and the next Node.

How is the new Entry node inserted into the linked list

Before Java 8, it was all about headers. The new value replaces the original value, and the value that was in the array is pushed down the list.

After Java 8 comes tail insertion. The new value goes straight down the list to the end of the list.

Why tail insertion?

There may be some people who think it’s not useful, is it? Of course not. Because there is a scaling mechanism in HashMap. The number of arrays in a HashMap is limited, and if the number of inserts reaches its limit, it needs to be expanded, namely resize.

Then the question arises again, when will resize? Let’s take a look at the source code for HashMap.

/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param  initialCapacity the initial capacity
* @param  loadFactor      the load factor
* @throws IllegalArgumentException if the initial capacity is negative
*         or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) { 
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                                initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
             initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
             throw new IllegalArgumentException("Illegal load factor: " +
                                                loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}
Copy the code

As can be seen from the above, there are two factors that determine resize:

initialCapacity:HashMapInitialization capacity, from the source codemapThe maximum capacity of delta is 1<<30, which is 1 moves 30 bits to the left, and for every bit to the left times 2, so delta1 * 2 ^ 30 = 1073741824.
loadFactor: Load factor, a number that must be greater than 0 and is not infinite. The default value is0.75 f. For example, if the current size is 100, when you save 76, the judgment discovery needs to be maderesize.

How do I expand capacity?

Capacity expansion: Create a new empty array of entries twice the length of the original array.

ReHash: Iterates through the array of entries and adds all previous entries to the new array using the hash algorithm.

In step 2, you need to rehash the hash using the following formula: Index = HashCode (Key) & (length-1) Hash = index = HashCode (Key) & (length-1) Hash = index = HashCode (Key) & (length-1) Hash = index = HashCode (Key) & (length-1)

Back to the previous question, why was Java 8 changed to tail-insert?

Suppose that we continue to use the first method to use the resize mode of assignment, the head of a singly linked list insert way, the same location on the new element is always on the list of head position, the same Entry in the old array elements on the chain, through to recalculate the index position, are likely to be on the new array of different positions, but we haven’t disconnect in the list, This leads to the following situation:

If we evaluate at this time, we have a problem — Infinite Loop. After Java 8, linked lists have red-black tree parts. There are a lot of [if else] judgments in the code, which reduces O(n) to O(logn). The header interpolation method changes the order of the linked list, but with the tail interpolation method, the original order of the linked list elements will be maintained during the expansion, and the linked list will not be looped.

Essence: Java 7 May cause an infinite loop when multithreading HashMap operations. The reason is that the order of the linked list is inverted after the expansion and transfer, and the reference relationship of the nodes in the original linked list is changed during the transfer. Java8 does not cause an infinite loop on the same premise, because the order of the linked list remains the same after expansion and migration, and the reference relationship between the previous nodes remains.

What is the default initialization length of a HashMap?

No more nonsense, directly on the code:

static final int EDFAULT_INITIAL_CAPACITY = 1 << 4;
Copy the code

We can see from the above that the initialization length of HashMap is 16, and we use bit operation, because bit operation is more efficient than arithmetic calculation (when we initialize, we try to specify the initial value size). The HashMap has an initial length of 16 to serve the algorithm that maps key to index. I mentioned that we get the key and then we get the index through the hash algorithm. Index = HashCode(key) & (length-1)

The size of HashMap is 16, then length-1 is 15, and binary is 1111. It is not difficult to see that the binary is all 1, so the result of index is equal to the value of the last few bits of HashCode, which means that as long as the input HashCode itself is evenly distributed, the result of hash algorithm will be uniform. The uniform distribution is achieved. In Java, for example, all objects inherit from the Object class. The Object class has two methods equals(); , hashCode (); , both methods are used to compare whether two objects are equal. Equals () is not overridden; Equals (); Equals (); Compare the memory address of two objects, and our new two objects have different memory addresses.

Value objects,= =It compares the values of two objects,
A reference object compares the addresses of two objects.

Orange and Orang could both be on the same list. Orange and Orang could both be on the same list.

When I get, I hash the key and calculate the index. I find 2. How can I find the specific orange and orang? Yes, equals() is used here; Method, if we’re overwriting equals(), be sure to call hashCode(); Method overrides to ensure that identical objects return the same hash value and different objects return different hash values. Otherwise a list of objects found in hashCode is the same.

Why is hashMap thread unsafe

As mentioned above, HashMap has made many optimizations from Java 7 to Java 8. For multi-threading, we will analyze it separately. Here is the conclusion first, and then we will analyze it step by step:

In a Java 7 multithreaded environment, capacity expansion may result in circular chains or data loss. In a Java 8 multithreaded environment, data overwriting can occur.

Let’s start with the Java 7 multithreaded environment. In a multi-threaded environment, HashMap causes ring chain or data loss during expansion, so it may be in expansion function and transfer method in expansion function, the specific code is as follows:

void transfer(Entry[] newTable, boolean rehash) {
    // Capacity of the new table
    int newCapacity = newTable.length;
    // walk through the table
    for (Entry<K,V> e : table) {
        while(null! = e) {// Save Entry for the next loop 
      ,v>
            Entry<K,V> next = e.next;
            if (rehash) {
                // Compute the hash value of e using the key value of e
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            // Insert e into the new table
            int i = indexFor(e.hash, newCapacity);
            // Insert e into I, and the resulting list is the opposite of the original table
            e.next = newTable[i];
            newTable[i] = e;
            // Next loope = next; }}}Copy the code

Let’s first analyze the above code. After expanding the table to newTable, the original data needs to be transferred to newTable. The elements in line 10-12 of the code are transferred using the header interpolation method (as mentioned before), and the order of the linked list will be reversed. Before capacity expansion, the execution process in single-threaded environment has been analyzed as follows:

When a single thread becomes a multi-threaded environment, the following scenarios may occur:

threadThread1In line 17NewTable [I] = e;The pointer in the array just happens to point to the list.

The threadThread2It’s starting to work,Thread2Can execute normally and completeresizeOperation, as shown below:

Because of the threadBIt’s done. According to the Java memory model, nownewTableAnd in the tableEntryThese are the latest values in main memoryNext = ora2. next=Ora2, Ora2. Next =null, and then switch to the threadThread1Continue executing, and data will appear【 Ora1 】and【 Ora5 】Missing (similar for rings), as shown below:

In Java 8, HashMap was optimized to insert directly into the end of the list instead of using headers when a hash collision occurs, so there is no looped list, but it is still unsafe in multi-threaded situations. Why is that? Again, let’s look at the code. Here we’ll look at the source code for the PUT operation

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null) // Insert elements directly if there is no hash collision
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
}
Copy the code

Note line 6, which inserts elements directly if there is no hash collision, as follows: Thread1 and Thread2 both perform a put operation, but when the hash value is the same and the insertion position is null, both Thread1 and Thread2 will enter line 6. In this case, if Treand1 enters and hangs without inserting data, Thread2 works normally and inserts data, and Thread1 gets the CPU time slice, Thread1 does not hash, but overwrites the inserted data directly. As a result, the data in Thread2 has been modified, which means that the thread is not safe. To sum up, HashMap is not secure in Java 7 or Java 8 with multiple threads.

How can we solve the problem that HashMap is not secure in multiple threads?

There are three solutions:

Collection.SynchronizedMap(Map)

Hashtable

ConcurrentHashMap

Collections.synchronizedmap is how to implement a thread-safe?

SynchronizedMap maintains a generic object Map and an exclusion lock mutex as follows:

private static class SynchronizedMap<K.V> implements Map<K.V>, Serializable {
    private static final long serialVersionUID = 1978198479659022715L;
   
    private final Map<K,V> m;     // Backing Map
    final Object mutex;        // Object on which to synchronize
   
    SynchronizedMap(Map<K,V> m) {
        this.m = Objects.requireNonNull(m);
        mutex = this;
   }
  
   SynchronizedMap(Map<K,V> m, Object mutex) {
       this.m = m;
       this.mutex = mutex; }}Copy the code

SynchronizedMap has two constructors. Map is mandatory. If mutex is passed in the constructor, an object exclusion lock is assigned and passed. SynchronizedMap is used by default if it is not passed in. After a SynchronizedMap is created, all methods that operate on the map use the synchronized modifier, and when a thread operates on the map, it locks the synchronized object. That way, no other thread can operate on it.

How is HashTable thread safe?

Compared to HashMap, HashTable is thread-safe, so HashTable can be used in a multi-threaded environment, but it is not very efficient. Why is that?

First, let’s take a look at the source code (here is the source code for the data operation) :

public synchronized V get(Object key) { Entry<? ,? > tab[] = table;int hash = key.hasCode();
}
Copy the code

As you can see, since HashTable manipulates data using synchronized modifiers, a thread will lock data, resulting in low efficiency.

The difference between HashTable and HashMap

Different implementations:HashtableinheritedDictionaryClass, andHashMapInheritance isAbstractMapclass
The initial capacity of HashMap is 16, and that of Hashtable is 11, but the load factor for both is the same, 0.75 by default.
Different capacity expansion mechanisms: When (existing capacity > total capacity * load factor),HashMapDirectly expand to (Current capacity x 2),HashtableThen expand to (Current capacity x 2 + 1).
Iterators are different:HashMaptheIteratorIterators are the fast fail mechanism used (fail-fast),HashtabletheEnumeratorIs a security failure mechanism (fail-safe).

The fail-safe mechanism used in Hashtable makes it impossible to read data as up-to-date each time. Let me tain(key) check whether the key exists. A Hashtable does not allow null keys, whereas in a HashMap keys can be null. We use put() in Hashtable; Method throws a null pointer exception when the argument is null, but special handling is done in the HashMap as follows:

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

`fail-fast`What is the

Fail -fast is a mechanism that, when iterating over a collection object, throws Concurrent Modification Exceptions if the contents of the collection object have been modified (add, delete, modify) during the iteration.

Because iterators access the contents of the collection directly during traversal, and use a modCount variable during traversal. If the contents of the collection change during traversal, the value of modCount is modified. Each iterator uses hashNext(); Or next (); The modCount==expectedmodeCount value is tested before the next element is iterated over. If the result is traversal; Otherwise, an exception is thrown and the traversal is terminated.

Note: Because the exception is detected by modCount! = expectedmodCount, if changes occur at this time so that the modCount value is equal to the set expectedmodCount value, the exception will not be thrown.

How is ConcurrentHashMap thread safe

In ConcurrentHashMap, the Java 7 and Java 8 versions are quite different, so let’s look at the Java 7 environment first. ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap ConcurrentHashMap Segment is an internal class of ConcurrentHashMap.

static final class Segment<K.V> extends ReentrantLock implements Serializable {

    private static final long serialVersionUID = 2249069246763182397L;

    // Just like HashEntry in HashMap, it is a bucket that actually holds data
    transient volatile HashEntry<K,V>[] table;

    transient int count;
    // Remember fail -- fast?
   transient int modCount;
   / / size
   transient int threshold;
   // Load factor
   final float loadFactor;

}
Copy the code

HashEntry is similar to HashMap, except that HashEntry uses volatile to modify its Value and the next node, next. Because we all know that volatile has the following three properties:

Visibility: To ensure that the variable is visible to different threads, i.e. when one thread changes the value of a variable, the new value is immediately visible to other threads.
Orderliness: Disallows instruction reordering (instruction reordering may cause code to perform sequential modifications).
atomic: Ensures atomicity of single read/write (i++This operation).

This allows ConcurrentHashMap to run in a multi-threaded environment.

In Java 8, if the array + linked list method is used, we need to iterate over the linked list when querying, which will lead to low efficiency, so we abandoned the original bad Segment locking, and adopted CAS+sychronized to ensure the security of concurrency. This is similar to HashMap, where HashEntry is changed to Node, but does not change. Volatile modiates the value and next to ensure that it is visible. In addition, a red-black tree is used, which converts when the list is greater than a certain value (8 by default). ReentrantLock was changed to synchronized because red-black trees were used to ensure query efficiency.

`ConcurrentHashMap`Why high concurrency

In a Java 7 environment, we can see from the Segment source code that it inherits from ReentrantLock, which uses the technology of segmental locking. Unlike Hashtable, where both put and GET operations need to be synchronized. Every time a thread holds a lock on one Segment, it does not affect the other segments. ConcurrentHashMap supports concurrent threads with CurrentLevel (number of segments). If the number of segments is 16, the concurrency is 16.

How does ConcurrentHashMap access elements?

In a Java 7 environment, let’s first look at the PUT operation. First look at the source code:

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
    // Locate the table in the current Segment to HashEntry using the hashcode of the key
    HashEntry<K,V> node = tryLock() ? null :
        scanAndLockForPut(key, hash, value);
    V oldValue;
    try {
        HashEntry<K,V>[] tab = table;
        int index = (tab.length - 1) & hash;
        HashEntry<K,V> first = entryAt(tab, index);
       for (HashEntry<K,V> e = first;;) {
           if(e ! =null) {
               K k;
               // Iterate over the HashEntry
               // If it is not empty, the key passed in is equal to the current key traversed.
               // Equal overwrites the old value.
                 if ((k = e.key) == key || (e.hash == hash && key.equals(k))) {
                     oldValue = e.value;
                     if(! onlyIfAbsent) { e.value = value; ++modCount; }break;
                 }
                 e = e.next;
           }else {
               // Create a HashEntry and add it to the Segment.
               // The system determines whether to expand the capacity.
               if(node ! =null)
                   node.setNext(first);
               else
                   node = new HashEntry<K,V>(hash, key, value, first);
               int c = count + 1;
               if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                   rehash(node);
               else
                   setEntryAt(tab, index, node);
               ++modCount;
               count = c;
               oldValue = null;
               break; }}}finally {
       / / releases the lock
       unlock();
   }
   return oldValue;
}
Copy the code

As you can see from the source, the put operation has the following two steps:

Try to obtain the lock. If the lock operation fails, it indicates that there are multiple threads competingscanAndLockForPut()Spin to get the lock;
If the number of attempts to acquire the lock is reachedMAX_SCAN_RETRIESBlock lock acquisition to ensure that the lock can be successfully acquired (otherwise, the data will not be written).

Now let’s look at the GET operation

willkeyValue throughhashLocate the correspondingSegment;
Pass it againhashLocate the corresponding element.

We’ve seen that the value attribute in HashEntry is volatile because of its memory visibility, which ensures that we get the latest value every time.

In a Java 8 environment, the PUT operation is also analyzed first. The PUT operation is divided into the following steps:

Calculate it according to keyhashcode;
Determine whether initialization is required.
For the currentkeyLocate the correspondingNodeIf the value is null, write dataCASTry to write, if failed, success is guaranteed by spin;
If the current position ofhashcode == MOVEIf MOVE is -1, the system needs to expand the capacity.
If not, usesynchronizedThe lock writes data.
If the quantity is greater thanTREEFY_THRESHOLDYou need to convert to a red-black tree.

The get operation consists of the following three steps:

According to the calculationhashcodeFind the corresponding address, if on the bucket directly return the value;
If it’s a red-black tree, get the value as a tree;
If not, iterate through the list to obtain the value.

We mentioned CAS earlier when analyzing ConcurrentHashMap, so let’s briefly analyze it.

CAS is an optimistic lock and a lightweight lock.

CAS is the main process thread to lock in reading data, at the time of writing data back, compare original value and present value are consistent, if the same data may (note that this is a may be modified, specific reasons behind analysis) has not been modified, the data can be assumed to write back, if inconsistent data has been modified, to perform read operations. The specific process is as follows:

When we judge the data to be consistent, we have doubts about whether the data has been modified. Because there is a classic ABA problem. The details are as follows:

One thread changed data A to B; Then another thread comes along and changes the data to A; The thread at this point finds that its value has not changed, but its data has been modified.

What is the solution to this ABA problem?

In fact as long as a sign of a change, such as the version number, every change once data, modify the version number, or add a timestamp, timestamp every change is different, so we only need to contrast data first, in the case of data, in comparing the sign bit is consistent, only two are consistent data has not changed.

Now that you’ve learned about the HashMap data structure and why it was designed this way, what other containers have evolved based on HashMap? What are the differences in their use? What scenes to use what might be a pit? Focus on Java Core & Interview column. We will continue to share more.

The last

The article is original, original is not easy, thanks for digging gold platform, feel harvest, help three lianha, thank you

Wechat search public number: Orange pine Java technology nest, make a friend, into the Internet technology exchange group

All code, sequence diagrams and architecture diagrams involved in the article are shared and can be requested free of charge through the public account plus group

If the article has the error, welcome the comment message points out, also welcome to reprint, trouble to mark the source is good