Abstract: Analyze the detailed use of the Map interface and how the underlying implementation of HashMap?
This article is shared from huawei cloud community “[illustrated] in-depth analysis of HashMap high-frequency interview and the underlying implementation structure!”, the original author: Grey Xiaoape.
You’ve all heard of the Map interface. It is a common way to store key-value pairs in Java, and it is also familiar with HashMap. When it comes to HashMap, I think people who know a little bit about HashMap will say that it is used to store key-value pairs in the form of array plus linked list. But do you have any idea how it’s really stored and how the underlying architecture is implemented?
Many people should say that I only need to know how to use HashMap, not the underlying implementation, but on the contrary, it is not enough to know how to use HashMap, and it is common to ask and examine the underlying implementation of HashMap in Java development interviews. So today I would like to analyze the detailed use of the Map interface and the underlying implementation of HashMap.
Friends slowly read down, read will definitely let you harvest!
1. What is the relationship between Map interface and List interface?
For this problem, if there is any relationship between these two interfaces, there is only one, and they are both collections. For storing data. Among other things, the Map interface and the List interface are not closely related. Why?
First, let’s take a look at the List interface. As I mentioned before, the List interface is inherited from the Collection interface and is a sub-interface of the Collection interface. It is only used for single-column storage of data. ** Inheritance relationship is shown as follows:
Map interface is a top-level interface, which contains many different implementation classes. It is used to store key:value pairs. The inheritance relationship is shown as follows:
So don’t confuse the Map interface with the List interface.
2. What are the common implementation classes of Map?
We’ve seen the Map inheritance structure above, and we’ve looked at many of the different implementation classes, many of which we’re familiar with, such as HashMap, TreeMap, and HashTable. What are the common implementation classes of the Map interface and what are their functions?
HashMap: The underlying implementation of a HashMap is an array + a linked list + a red-black tree. By default, the initial size of the array is 16, and the expansion factor is 0.75. That is, every time we reach 75% of the storage capacity in our array, we need to double the size of the array.
HashTable: The HashTable interface is thread-safe, but has long been in use and is now almost a legacy class and not recommended for development.
ConcurrentHashMap: This is one of the most used thread-safe Map implementation classes today. Before 1.7, the piecewise locking mechanism was used to implement thread-safe. However, thread safety is achieved after 1.8 using the synchronized keyword.
Among them, investigations and questions about HashMap are the most frequent in the interview, which should also be deeply understood and mastered in daily development. So next, I will mainly analyze the implementation principle of HashMap and common questions in the interview.
How do you put a HashMap?
We know that HaahMap uses put to store data. There are two parameters in HaahMap, namely key and value. So let’s analyze it.
Note: the same key is not necessarily the same value, but there are some characteristics of the same, which is what we continue to see!
A HashMap calculates the array index of the value to be stored according to the key. If the array index is at a location where there is no element, then the stored element is stored. If there is already an element at that location, then the linked list is used. Just store the data down in the linked list order. This is the simple procedure for putting, storing the result as follows:
However, sometimes we store a lot of data, so if we always use the form of linked list for data storage, or the length of our linked list is very large, so it is very troublesome to delete or insert operations, so what should we do in this case?
This involves a process of “tree” and “chain” when storing data in a linked list. Then what is “tree” and “chain”?
When we are in the key/value pair for storage, if we are in the same array subscript stored data too much, can cause our list through long, lead to delete and insert more troublesome, so in Java rules, when the list length is more than 8, we will “tree” on the list for operation, Converting it to a red-black tree (a binary tree where the value of the left node is less than the root node and the value of the right node is greater than the root node) allows us to perform a binary search for elements, which greatly increases the search efficiency.
But what happens when we delete some nodes and the length of the list is no longer greater than 8? Why rush to convert red-black trees into linked lists? No, we only convert a red-black tree back to a linked list if the length of the list is less than 6, a process called “linking”.
The process is illustrated as follows:
So why tree at length 8 and chain at length less than 6? Why not just “chain” at length less than 8?
** The main reasons are as follows: ** When an element is deleted and the length of the list is less than 8, it is directly “linked”, and when another element is added and the length is equal to 8, it is also “tree-like”, so the repeated “linked” and “tree-like” operations are particularly time-consuming and troublesome. Therefore, the program provides that only when the length of the list is greater than or equal to 8, the “tree”, and the length is less than 6, the “chain”, which about 8 tree, 6 chain these two thresholds I hope you keep in mind!
4. In what order is the data stored in the linked list?
We now know how elements are stored in a HashMap, but interviewers sometimes ask us, in a HashMap, are elements stored at the head or at the tail?
This is what we need to know about the storage of linked list elements in a HashMap.
In JDK1.7 and before, it is inserted at the header, and after JDK1.8 it is inserted at the tail.
5. How to implement Hash(key) method?
Now that we know how elements in a HashMap are stored, how do we evaluate array indices based on key values?
We know that the initial capacity of HashMap is 16 bits, so for the initial 16 data bits, if the data is calculated and stored according to the key value, the simplest method is to obtain an int value according to the key value. The method is as follows:
int hashCode = key.hashCode()
Then mod the obtained hashCode with 16,
hashCode % 16 = 0~15
That’s always going to be 0 minus 15. This is also the most primitive way to compute a hash.
However, in practical cases, the hash(key) calculated by this method is not optimal, the elements stored in the array are not the most scattered, and it is very inconvenient to perform redundant operations in the computer.
Therefore, in order to calculate the result as discreteness as possible, the most commonly used method to calculate the array subscript is as follows: first calculate a hashCode according to the value of the key, perform xOR operation on the high and low 18 bits of hashCode, and subtract the result from the current array length. We end up with an array index as follows:
int hashCode = key.hashCode()
Int hash = hash(key) = high 16 bits of hash(key) = key.hashCode() ^ low 16 bits &(n-1) where n is the current array length
And a word of caution here.
The hash(key) calculation is slightly different in JDK1.7 and JDK1.8
In JDK1.8, the hash(key) is perturbed twice
In JDK1.7, the hash(key) is perturbed nine times, namely four bits and five xOR operations
The disturbance may be understood as the number of operations
That’s how you implement the Hash(key) method.
6. Why is HashMap always a multiple of 2?
The reason why HashMap capacity is always multiple of 2 is actually related to the hash(key) algorithm mentioned above.
The reason is that the resulting values are discrete only if the values of (n-1) participating in the hash(key) algorithm are as many as possible ones. If our current array length is 16, the binary representation is 10000, and n-1 is 01111, so that the value of n-1 is as much 1 as possible, as is the value of any multiple of 2 that is subtracted by 1.
Therefore, only when the size of the array is a multiple of 2, the hash(key) value can be relatively discrete.
7. How to resolve Hash conflicts?
What is a Hash conflict? That is, when I calculate an array index, the index has already stored elements, which is called Hash collision. Obviously, if we do not have a good algorithm to calculate the array index, it is easy to accumulate stored data on the same index, resulting in too many Hash collisions.
So how do you resolve hash conflicts?
In fact, the best solution is to make the array subscripts calculated by the stored key as discrete as possible, that is, to optimize the hash(key) as much as possible, and the array length is a multiple of 2. This is the main resolution of Hash conflicts.
See the underlying source code for key sections of HashMap below:
Underlying implementation of Hash(key)
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code
The low-level implementation of the PUT (key,value) method
/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }Copy the code
8. How is HashMap expanded?
We mentioned above that the initial size of a HashMap array is 16 bits, but obviously 16 bits is not enough. How to expand a HashMap?
There is a parameter called the expansion factor, which in the HashMap has a size of 0.75.
As we mentioned above, for an array with an initial length of 16, when the data stored in it is 16*0.75=12. The array element is expanded by twice the size of the original array, which is currently 15 bytes, or 32 bits.
9. How are the elements stored after expansion?
We know that the size of the HashMap array increases after the expansion, so the newly expanded portion will be empty. But should we leave the data bits blank at this point? This is obviously not possible, as it would be a huge waste of memory.
Implementation comparison of HashMap between JDK1.7 and JDK1.8
The implementation of HashMap is slightly different in JDK1.7 and JDK1.8. Finally, we compare the implementation of HashMap between JDK1.7 and JDK1.8.
(1) Different underlying data structures
In the put process of HashMap, there is no concept of red-black tree in JDK1.7, which is directly used for linked list storage. The concept of red-black tree was introduced after JDK1.8 to optimize storage and lookup.
(2) Different insertion methods of linked lists
When a HashMap inserts an element into a linked list, it is inserted at the head node in JDK1.7 and at the tail node after JDK1.8.
(3) Different Hash(key) calculation methods
In the Hash(key) calculation, JDK1.7 performs nine scramblings, namely four bits and five xor operations, and only two scramblings after JDK1.8.
(4) After capacity expansion, the calculation method of data storage location is different
In JDK1.7, the location of all data is scrambled, and the hash(key) is used to rearrange the data. In JDK1.8, the for loop is used twice for the original data index. The new subscript location can only be computed at the original subscript location or at the original subscript location plus the original capacity location.
Well, about the Map interface and the underlying implementation of HashMap process, as well as in the interview reference to the core questions and we analyze here!
Click to follow, the first time to learn about Huawei cloud fresh technology ~