The previous “JDK source analysis -HashMap(1)” analyzes the internal structure of HashMap and the implementation principle of the main methods. However, there are many other questions that are often asked during an interview, and this article will briefly analyze some of the most common ones.
Here is the internal structure of HashMap (JDK 1.8) :
FAQ:
Q1: Is HashMap thread safe? Why is that?
First, HashMap is thread-unsafe. Many of you know this, and the HashMap source code explains it. butWhy is it unsafe? Where is it reflected? The following two examples are briefly analyzed (may not be comprehensive, just for reference).
Case 1:
Thread T1 performs structural changes such as put/remove (structural modification) operation; Thread T2 perform traversal operations, this case will throw ConcurrentModificationException.
Example code (take PUT as an example) :
private static void test(a) {
Map<Integer, Integer> map = new HashMap<>();
Thread t1 = new Thread(() -> {
for (int i = 0; i < 5000; i++) {
map.put(i, i);
}
});
Thread t2 = new Thread(() -> {
for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
System.out.println(entry);
}
});
t1.start();
t2.start();
}
// Execution result:
/ / throw the Java. Util. ConcurrentModificationException
Here’s why:
if(modCount ! = expectedModCount)
throw new ConcurrentModificationException();
HashMap iterators andIn the collection view, are compared to the value. The purpose is to determine whether other threads are structurally modifying the HashMap, and if so, to throw an exception.
PS: If you read the HashMap source code carefully, you can find the following line in the structural modification method:
++modCount;
This value is used to record the number of structural changes.
case 2:
Both threads T1 and T2 perform put /Structural changes such as remove (structural modification). Using the PUT method as an example, occursElement override.
Sample code:
private static void test(a) throws InterruptedException {
Map<Integer, Integer> map = new HashMap<>();
Thread t1 = new Thread(() -> {
for (int i = 0; i < 5000; i++) {
map.put(i, i);
}
});
Thread t2 = new Thread(() -> {
for (int i = 5000; i < 10000; i++) {
map.put(i, i);
}
});
t1.start();
t2.start();
TimeUnit.SECONDS.sleep(20);
System.out.println(map);
System.out.println("size: " + map.size());
}
// Output result:
// {8192=8192, 8193=8193, 8194=8194, 8195=8195,...
// size: 9666
// PS: This is the result of a single test. The specific result of multiple tests may be different, but basically there is no case where size=10000.
The problem here is with the put methodAnalysis of HashMapThe internal implementation of the PUT method explains why, but I won’t go over it here.
The purpose of HashMap is to be efficient in a single thread. The purpose of understanding thread insecurity is to understand when it is not appropriate to use it. ConcurrentHashMap can be used if thread safety is required.
Q2: Why are the conversion thresholds for linked lists and red-black trees 8 and 6?
Let’s first analyze why we have linked lists and red black trees. Ideally, there is only one node for each bin location in the HashMap, so that the query efficiency is the highestO(1). Pulling out a list, or turning the list back into a red-black tree, is a response to high hash conflicts in order to keep HashMap efficient in extreme cases.
As for why 8, the HashMap part Implementation Notes reads as follows:
/* Because TreeNodes are about twice the size of regular nodes, we
* use them> * (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5> * threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
Occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1:0.30326533
* 2:0.07581633
* 3:0.01263606
* 4:0.00157952
* 5:0.00015795
* 6:0.00001316
* 7:0.00000094
* 8:0.00000006
* more: less than 1 in ten million
* /
Since the size of a TreeNode is twice that of a Node, a bin can be converted to a TreeNode only when there are enough nodes in the bin (TREEIFY_THRESHOLD). When the number of nodes in bin decreases (delete nodes or expand capacity), the red-black tree is converted to a linked list.
When hashCode is evenly distributed, TreeNode is rarely used. Ideally, under a randomly distributed hashCode, the distribution of nodes in bin follows a Poisson distribution, and several data are listed. You can see that the probability of a linked list length reaching 8 in a bin (0.00000006) is less than 1 in 10 million, so set the threshold for transformation to 8.
The two conversion thresholds and their descriptions are as follows:
/ * *
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
* /
static final int TREEIFY_THRESHOLD = 8;
/ * *
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
* /
static final int UNTREEIFY_THRESHOLD = 6;
The threshold for converting a red-black tree to a linked list is 6, in order to avoid frequent conversions. For example, if the threshold for turning a red-black tree into a linked list is also 8, and if a HashMap is constantly inserting and deleting elements, the number of linked lists is always around 8, the conversion between the tree and the linked list will be frequent and inefficient.
This explanation seems to have some truth, you can explore.
Q3: Why is the load factor 0.75?
JDK 1.7:
/* As a general rule, the default load factor (.75) offers a good tradeoff
* between time and space costs. Higher values decrease the space overhead
* but increase the lookup cost (reflected in most of the operations of the
* <tt>HashMap</tt> class, including <tt>get</tt> and <tt>put</tt>).
* /
The following article has also made an analysis of this:
Why is the loadFactor of HashMap 0.75?
https://www.jianshu.com/p/64f6de3ffcc1
Maybe there is no need to study this problem so deeply. Simply speaking, it is the tradeoff of time and space.
Q4: Why is the capacity a power of two?
The bit operation n & (length-1) is the same as the mod operation n % length. But in theIn computers, bit operations are much more efficient than mod operations. So, we’re doing this for efficiency.
Q5: What types of elements are commonly used as keys? Why is that?
Strings, integers, etc., are Immutable and are inherently thread-safe as Immutable classes.It also overrides the hashCode method and equals method.
Q6: How good is the hash algorithm? HashCode implementation of the String class?
The design of hash methods can be many. Although the more uniform the hash, the better,But the primary purpose of a HashMap is to be fast, so the hash algorithm should be designed to be as fast as possible.The hashCode method of the String class looks like this:
public int hashCode(a) {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
PS: The above questions are the results of my sorting and thinking after searching on the Internet, only for reference, not necessarily completely accurate (to be skeptical). There are many more problems with HashMaps that I won’t list here.
Reference links:
https://www.jianshu.com/p/7af5bb1b57e2
Related reading:
JDK source code -HashMap(1)
Stay hungry, stay foolish.