Interviewer: Young man, can you tell me how HashMap works?

1. Overview of HashMap:

HashMap is an asynchronous implementation of the Map interface based on Hashtable (it is similar to Hashtable, but Hashtable is thread-safe, so it is a synchronous implementation). This implementation provides optional mapping operations and allows null values and NULL keys, but it is not ordered.

2. HashMap data structure and implementation principle:

In JDK1.7 and JDK1.8, the data structure of HashMap is different. Some optimizations are made to resolve conflicts

(1) JDK1.7 version

In jdk1.7, the use of array + linked list form, that is, the use of zipper method.

Add the hash conflicting values to the linked list of each array, and insert them in the form of head insertion (the biggest drawback of this method is that it will cause the inserted values to generate a loop, which will be infinite, and we will explain the disadvantages of this method in detail later). When calculating the hash value, Jdk1.7 is using 9 times of disturbance (4 times of bit operation +5 times of xor operation) way to process (because I currently temporarily use JDK1.8 so can not use the source code to explain, please understand), in addition to the expansion is also different, (hashCode ->> Perturbation function ->> (h&Length-1)) in JDK1.7, while in JDK1.8, it is calculated according to the law of expansion (i.e. location after expansion = original location or original location + old capacity). Let’s take a closer look at JDK1.8.

(2) JDK1.8 version

Jdk1.8 uses array + linked list + red-black tree, as shown below:

This method can be greatly optimized the hash of the conflict problem, reduce the search time, when added to the number of threshold can convert list to the form of a red-black tree, and red and black tree node is less than 6 from the red-black tree into the form of a list (if you don’t understand the red-black tree, you can refer to my previous article). And in the insertion value is used in the form of tail insertion method, this method solves the occurrence of loop

 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p;else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }}if(e ! = null) { // existing mappingfor key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

The above code is the array put operation, node judgment and add, there are several core methods let us analyze one by one:

The resize () :

This method as the name implies — — — — — – expansion, while when he was in jdk1.8 two calls, is initialized in the array for the first time for its expansion, and the second is in the array is full time for expansion, is to start at a time, a time is near the end of time, let’s go in to look at the detail he in the internal operation and meaning:

 final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; Int oldThr = threshold; // Check whether the old array is empty. Int newCap, newThr = 0; int newCap, newThr = 0; // Initialize the assignmentif (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // Double capacity, also can understand to expand the capacity}else if(oldThr > 0) // The initial capacity is at the threshold newCap = oldThr;else{// Zero initial threshold newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); }if (newThr == 0) {
            float ft = (float)newCap * loadFactor; NewThr = (newCap < MAXIMUM_CAPACITY &&ft < (newCap < MAXIMUM_CAPACITY &&ft < (newCap < MAXIMUM_CAPACITY &&ft < (newCap < MAXIMUM_CAPACITY &&ft < (newCap < MAXIMUM_CAPACITY &&ft < (newCap < MAXIMUM_CAPACITY &&ft < (newCap < MAXIMUM_CAPACITY &&ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes"."unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if(oldTab ! = null) {for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! = null) { oldTab[j] = null;if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null);if(loTail ! = null) { loTail.next = null; newTab[j] = loHead; }if(hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } }return newTab;
    }
Copy the code

As you can see, the expansion method makes detailed judgments about array initialization and nullness. Many people will look at this and ask why the default load factor is 0.75. Then this question will be briefly explained in the following typical interview questions. Let’s go back to the put method above.

hash(key)

This method is also our focus. It was developed to ensure that every position (subscript) in the array is fully utilized and hash conflicts are greatly resolved. In JDK1.7 we used 9 perturbations to compute the hash value, whereas in JDK1.8 we used only 2 perturbations to compute the hash value.

  static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

Copy the code

As you can see from the above code, it does an xOR operation with key.hashcode () 16 bits high and 16 bits low, reducing the hash collision problem. After analyzing the two main methods, let’s describe the overall process of the PUT method. In this execution process, we can clearly see the value insertion and the use of the two core methods (HERE I quote a relatively complete flow chart that I think is drawn for explanation).

Check whether the key-value pair array table[I] is empty or null; otherwise, perform resize() for expansion;

Table [I]==null; table[I]==null; table[I]==null;

Check whether the first element of the table[I] is the same as the first element of the key. If the first element of the table[I] is the same as the first element of the key.

Check whether table[I] is a treeNode, i.e. whether table[I] is a red-black tree. If table[I] is a red-black tree, insert key pairs directly into the tree; otherwise, change to ⑤.

(5). Traverses table[I] to determine whether the length of the linked list is greater than 8. If the length is greater than 8, the linked list is converted into a red-black tree, and the insertion operation is performed in the red-black tree; otherwise, the insertion operation of the linked list is carried out; If the key already exists in the traversal process, the value can be overwritten directly.

⑥. After the insert is successful, check whether the actual number of key value pairs size exceeds the maximum capacity threshold. If so, expand the capacity. Now that our explanation is over, here are some interview questions encountered during the interview process.

Typical interview questions

1. Why using the high and low 16 bits of Hashcode xor can reduce hash collisions? Can a hash function use the key’s hashcode directly?

Since key.hashCode calls the hash function of the key value, it returns a hash of type int. We know that int -2147483648~2147483647** has about 4 billion Spaces, and in our array we only have 16. Therefore, the array will not be able to carry values, because it must be evenly distributed to effectively avoid the problem of hash collision, so it has to perform modulus operation.

2. Why is the default HashMap load factor 0.75 chosen?

Since the compromise between improving space utilization and reducing query cost is mainly poisson distribution, 0.75 is the minimum collision, because there are two factors affecting performance in hashmap, one is the initial capacity and the other is the loading factor. If the loading factor is small, the space utilization will be low and the number of rehash will be increased. In order to minimize the number of rehashes, 0.75 is the best standard number and a compromise.

3. What are the drawbacks of the way jdk1.7 inserts data?

I mentioned above form the ring, so why is it, because in the process of using, if a thread insert in a node, another thread insert node, and the two threads are for the points on the thread insert, so that when the first order to can make the list in reverse, formed a ring, caused the death cycle, the following figure:

The last

Thank you for reading here, after reading what do not understand, you can ask me in the comments section, if you think the article is helpful to you, remember to give me a thumbs up, every day we will share Java related technical articles or industry information, welcome to pay attention to and forward the article!