In-depth analysis of HashMap high-frequency interview and the underlying implementation architecture! | Java development

December 8, 2023

by Jared Finley

No Comments

This article is participating in “Java Theme month – Java development practice”, see detailsActive link

HashMap interview questions

1. What is the relationship between Map interface and List interface?

2. What are the common implementation classes of Map?

How do you put a HashMap?

4. In what order is the data stored in the linked list?

5. How to implement Hash(key) method?

6. Why is HashMap always a multiple of 2?

7. How to resolve Hash conflicts?

8. How is HashMap expanded?

9. How are the elements stored after expansion?

Implementation comparison of HashMap between JDK1.7 and JDK1.8

Hello, I’m Grey Ape! A super bug writing program ape!

You’ve all heard of the Map interface. It is a common way to store key-value pairs in Java, and it is also familiar with HashMap. When it comes to HashMap, I think some people who know a little bit about it should say: ** This is the form of storing key-value pairs, which is an array plus a linked list. ** But do you have any idea how storage is really done and how the underlying architecture is implemented?

Many people should say that I only need to know how to use HashMap, not the underlying implementation, but on the contrary, it is not enough to know how to use HashMap, and it is common to ask and examine the underlying implementation of HashMap in Java development interviews. So today I would like to analyze the detailed use of the Map interface and the underlying implementation of HashMap.

Friends slowly read down, read will definitely let you harvest!

1. What is the relationship between Map interface and List interface?

For this problem, if there is any relationship between these two interfaces, ** there is only one, and they are both collections. For storing data. ** The Map interface and the List interface are not related to each other.

** The List interface is inherited from the Collection interface. It is a sub-interface of the Collection interface. It is only used for single-column storage of data. ** Inheritance relationship is shown as follows:

Map interface is a top-level interface, which contains many different implementation classes. It is used to store key:value pairs. The inheritance relationship is shown as follows:

So don’t confuse the Map interface with the List interface.

2. What are the common implementation classes of Map?

We’ve seen the Map inheritance structure above, and we’ve looked at many of the different implementation classes, many of which we’re familiar with, such as HashMap, TreeMap, and HashTable. What are the common implementation classes of the Map interface and what are their functions?

**HashMap:** The underlying implementation of a HashMap is an array + a linked list + a red-black tree. By default, the initial size of the array is 16, and the expansion factor is 0.75. ** That is, whenever we reach 75% of the storage capacity in the array, we need to double the array capacity.

**HashTable: The **HashTable interface is thread-safe, but has been in use for a long time and is now almost a legacy class and not recommended for development.

**ConcurrentHashMap:** This is one of the most commonly used thread-safe Map implementation classes today. Before 1.7, the piecewise locking mechanism was used to implement thread-safe. However, thread safety is achieved after 1.8 using the synchronized keyword.

Among them, investigations and questions about HashMap are the most frequent in the interview, which should also be deeply understood and mastered in daily development. So next, I will mainly analyze the implementation principle of HashMap and common questions in the interview.

How do you put a HashMap?

We know that HaahMap uses put to store data. There are two parameters in HaahMap, namely key and value. So let’s analyze it.

The implementation of HashMap is array + linked list. In the upper layer of HashMap, “same” keys are stored in array form, and corresponding keys and values are linked and stored in linked list form in the lower layer.

** ** ** ** ** ** ** ** ** ** *

A HashMap calculates the array index of the value to be stored according to the key. If the array index is at a location where there is no element, then the stored element is stored. If there is already an element at that location, then the linked list is used. Just store the data down in the linked list order. This is the simple procedure for putting, storing the result as follows:

However, sometimes we store a lot of data, so if we always use the form of linked list for data storage, or the length of our linked list is very large, so it is very troublesome to delete or insert operations, so what should we do in this case?

** here involves a linked list of data storage, “tree” and “chain” a process, ** then what is “tree” and “chain”?

When we are in the key/value pair for storage, if we are in the same array subscript stored data too much, can cause our list through long, lead to delete and insert more troublesome, so in Java, * * when list length is more than 8, we will “tree” on the list for operation, **** converts it to a red-black tree (a binary tree where the value of the left node is less than the root node and the value of the right node is greater than the root node), ** so that when we search for elements, it is similar to binary search, so that the search efficiency will be greatly increased.

But what happens when we delete some nodes and the length of the list is no longer greater than 8? Why rush to convert red-black trees into linked lists? No, we only convert a red-black tree back to a linked list if the length of the list is less than 6, a process called “linking”.

The process is illustrated as follows:

So why tree at length 8 and chain at length less than 6? Why not just “chain” at length less than 8?

** The main reasons are as follows: ** When an element is deleted and the length of the list is less than 8, it is directly “linked”, and when another element is added and the length is equal to 8, it is also “tree-like”, so the repeated “linked” and “tree-like” operations are particularly time-consuming and troublesome. Therefore, the program provides that only when the length of the list is greater than or equal to 8, the “tree”, and the length is less than 6, the “chain”, which about 8 tree, 6 chain these two thresholds I hope you keep in mind!

4. In what order is the data stored in the linked list?

We now know how elements are stored in a HashMap, but interviewers sometimes ask us, in a HashMap, are elements stored at the head or at the tail?

This is what we need to know about the storage of linked list elements in a HashMap.

In JDK1.7 and before, it is inserted at the header, and after JDK1.8 it is inserted at the tail.

5. How to implement Hash(key) method?

Now that we know how elements in a HashMap are stored, how do we evaluate array indices based on key values?

We know that the initial capacity of HashMap is 16 bits, so for the initial 16 data bits, if the data is calculated and stored according to the key value, the simplest method is to obtain an int value according to the key value. The method is as follows:

int hashCode = key.hashCode()

Then mod the obtained hashCode with 16,

hashCode % 16 = 0~15

That’s always going to be 0 minus 15. This is also the most primitive way to compute a hash.

** But in practice, the hash(key) computed by this method is not optimal, the elements stored in the array are not the most scattered, and it is very inconvenient to perform cooperations on the computer.

** Therefore, in order to calculate the result as discretized as possible, the most commonly used method to calculate the array subscript is: ** First calculates a hashCode according to the value of the key, performs xOR operation on the high and low 18 bits of the hashCode, and then subtracts the result with the current array length. We end up with an array index as follows:

int hashCode = key.hashCode()

Int hash = hash(key) = high 16 bits of hash(key) = key.hashCode() ^ low 16 bits &(n-1) where n is the current array length

And a word of caution here.

The hash(key) calculation is slightly different in JDK1.7 and JDK1.8

In JDK1.8, the hash(key) is perturbed twice

In JDK1.7, the hash(key) is perturbed nine times, namely four bits and five xOR operations

The disturbance may be understood as the number of operations

That’s how you implement the Hash(key) method.

6. Why is HashMap always a multiple of 2?

The HashMap is always a multiple of 2 because of the hash(key) algorithm described above.

The reason is that the resulting values are discrete only if the values of (n-1) participating in the hash(key) algorithm are as many as possible ones. If our current array length is 16, the binary representation is 10000, and n-1 is 01111, so that the value of n-1 is as much 1 as possible, as is the value of any multiple of 2 that is subtracted by 1.

Therefore, only when the length of the array capacity is a multiple of 2, the hash(key) value calculated can be relatively discrete.

7. How to resolve Hash conflicts?

What is a Hash conflict? That is, when I calculate an array index, the index has already stored elements, which is called Hash collision. Obviously, if we do not have a good algorithm to calculate the array index, it is easy to accumulate stored data on the same index, resulting in too many Hash collisions.

So how do you resolve hash conflicts?

In fact, the best solution is to make the array subscripts calculated by the stored key as discrete as possible, that is, to optimize the hash(key) as much as possible, and the array length is a multiple of 2. This is the main resolution of Hash conflicts.

See the underlying source code for key sections of HashMap below:

Underlying implementation of Hash(key)

    /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
Copy the code

The low-level implementation of the PUT (key,value) method

/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for  the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }Copy the code

8. How is HashMap expanded?

We mentioned above that the initial size of a HashMap array is 16 bits, but obviously 16 bits is not enough. How to expand a HashMap?

There is a parameter called “expansion factor”, which in the HashMap has a size of 0.75,

** We also mentioned above that for an array with an initial length of 16, when the data stored in it is 16*0.75=12. The array element is expanded by twice the size of the original array. ** is currently 15 bytes and 32 bits.

9. How are the elements stored after expansion?

We know that the size of the HashMap array increases after the expansion, so the newly expanded portion will be empty. But should we leave the data bits blank at this point? This is obviously not possible, as it would be a huge waste of memory.

Therefore, after the array expansion of the HashMap, the data elements stored in the original HashMap array will be re-allocated and stored in the new array. To make full use of array space.

Implementation comparison of HashMap between JDK1.7 and JDK1.8

The implementation of HashMap is slightly different in JDK1.7 and JDK1.8. Finally, we compare the implementation of HashMap between JDK1.7 and JDK1.8.

(1) Different underlying data structures

In the put process of HashMap, there is no concept of red-black tree in JDK1.7, which is directly used for linked list storage. The concept of red-black tree was introduced after JDK1.8 to optimize storage and lookup.

(2) Different insertion methods of linked lists

When a HashMap inserts an element into a linked list, it is inserted at the head node in JDK1.7 and at the tail node after JDK1.8.

(3) Different Hash(key) calculation methods

In the Hash(key) calculation, JDK1.7 performs nine scramblings, namely four bits and five xor operations, and only two scramblings after JDK1.8.

(4) After capacity expansion, the calculation method of data storage location is different

In JDK1.7, the location of all data is scrambled, and the hash(key) is used to rearrange the data. In JDK1.8, the for loop is used twice for the original data index. The new subscript location can only be computed at the original subscript location or at the original subscript location plus the original capacity location.

Well, about the Map interface and the underlying implementation of HashMap process, as well as in the interview reference to the core questions and we analyze here!

If you have any questions, please leave them in the comments section! Feel good remember to like attention yo!

I’m grey Ape! See you next time!