preface
HashMap is one of the most popular questions to ask a Java backend engineer, because it’s a great way to test a candidate’s Java foundation.
HashMap is one of the longest used collections in Java. In the SOURCE code of the JDK, HashMap is Elegant and contains a lot of clever design ideas, which makes it a key test point for job interviews.
The opening
Interviewer: Why don’t you introduce yourself first?
Angela: I am Angela, one of the three bitches in the grass, the strongest single (Zhong Kui not accept)! Oh, no, I have been visiting. I am **. I am currently working in — company to do — system development.
Interviewer: According to your resume, you are familiar with Java collections and have used HashMap, right?
Angela: Yes. (The same familiar smell)
Interviewer: Can you tell me about the internal data structure of HashMap?
Angela: I’m currently using JDK1.8, internally using arrays + linked lists of red-black trees;
Angela: If it’s convenient for me to draw a data structure diagram for you:
Interviewer: Are you familiar with the data insertion principle of HashMap?
Angela: Well [meditating]. I think I should draw a clearer picture, as follows:
- Determine whether the array is empty, and initialize it if it is empty;
- If it is not empty, compute the hash value of k, using (n-1) &hash to compute the index that should be stored in the array;
- If there is no data in table[index], create a Node and store the Node in table[index].
- If there is data, it indicates that hash conflicts occur (the hash values of the two key nodes are the same). Continue to check whether the key is equal, and replace the original key with the new value (onlyIfAbsent is false).
- If not, determine whether the current node type is a tree node. If it is a tree node, create a tree node and insert it into the red-black tree.
- If it is not a tree, create a normal Node and add it to the list. Determine whether the length of the list is greater than 8. If the length is greater than 8, the list is converted to a red-black tree;
- After the insertion is complete, check whether the number of current nodes is greater than the threshold. If the number is greater than the threshold, expand the number to two times of the original array. 【答案 】
Interviewer: Just now you mentioned the initialization of a HashMap. How do you set the initial size of a HashMap?
Angela: [is that a problem??] In general, if new HashMap() does not pass a value, the default size is 16 and the load factor is 0.75. If you pass an initial size k, the initial size is 2 to the integer power greater than k. For example, if you pass 10, the size is 16. (Supplementary note: The implementation code is as follows)
static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; } copy codeCopy the code
Supplementary note: The following figure is the detailed process. The algorithm is to make the initial binary shift 1, 2, 4, 8 and 16 bits to the right, respectively, and to change the first number with the highest value of 1 to all the numbers after the highest value of 1 into 1, 111111 + 1 = 1000000 = (consistent with greater than 50 and an integer power of 2) by constantly shifting to the right.
Interviewer: You mentioned hash functions, do you know how the hash function of a HashMap is designed?
Angela: The hash function takes the hashcode that passes through the key, which is a 32-bit int, and makes the high and low 16 bits of the Hashcode perform xOR operations.
Interviewer: Do you know why?
Angela: [also ask] this is also called a disturbance function, and it’s designed for two reasons:
- Make sure to keep hash collisions as low as possible and as scattered as possible;
- The algorithm has to be as efficient as possible, because it’s a high-frequency operation, so it’s a bit operation;
Interviewer: Why does using hashcode’s high and low 16-bit XOR reduce hash collisions? Can the hash function directly use the key’s hashcode?
[This question is a little tricky], Angela almost stood still, eager to biubiubiu two one three.
Angela: Because the key.hashcode () function calls the built-in hash function of the key type and returns an int hash. The value of int ranges from **-2147483648 to 2147483647**, which adds up to about 4 billion mapping space. As long as the hash function is mapped evenly and loosely, it is very difficult to have collisions in general applications. But the problem is that with an array four billion long, there’s no room for it. If you think about it, if the initial size of the HashMap array is 16, you can use the remainder to access the array index by modulo the length of the array. (From Zhihu – Fat Gentleman)
Source module operation is the hash value and array length -1 to do an “and” operation, bit operation faster than % operation.
bucketIndex = indexFor(hash, table.length); static int indexFor(int h, int length) { return h & (length-1); } copy codeCopy the code
This, by the way, explains why the length of HashMap arrays is an integer power of 2. Because this (array length -1) is exactly the same as a “low order mask”. The result of the “and” operation is that all the higher hash values are zeroed, and only the lower hash values are retained for array indexing access. For example, if the initial length is 16, 16-1=15. In base 2, the value is 00000000 00000000 00001111. The result is the interception of the lowest four values.
00000000 00001111 10100101 11000100 00100101 & 00000000 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 00000000, 00000000, 00000101 // Set all the high orders to zero, leaving only the last four bits of copy codeCopy the code
But that’s where the problem comes in, because even if MY hash is loosely distributed, if I just pick the last few bits, I’m going to get a lot of collisions. What’s more, if the hash itself is not done well, the arithmetic holes in the distribution, if the last few low positions are repeated regularly, are extremely painful.
That’s where the value of the perturbation function comes in, and you get the idea. If you look at this graph,
The right displacement is 16 bits, which is exactly half of 32bit, and its own high half and low half do xor, in order to mix the high and low of the original hash code, to increase the randomness of the low. Moreover, the mixed low position is mixed with some of the characteristics of the high position, so that the information of the high position is preserved in a disguised way.
Finally, let’s look at An experiment from Peter Lawley’s column An Introduction to Optimising a hashing Strategy: He took 352 strings at random, masked them in low order, and subscripted them, provided their hash values did not conflict at all.
The results showed that when the length of the HashMap array was 512 (2 ^ 9), that is, when the mask was used to pick the lower 9 bits, there were 103 collisions, or nearly 30%, without the disturbance function. There are only 92 collisions after using the perturbation function. Collisions decreased by nearly 10 percent. So the perturbation function does work.
In addition, Java1.8 has been adjusted compared to 1.7, 1.7 has done four shifts and four xor, but it is obvious that Java 8 feels that disturbance to do once is enough, if you do four times, more may not be marginal utility, so the so-called for efficiency considerations to change to once.
Here’s the hash code for 1.7:
static int hash(int h) { h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } copy codeCopy the code
Interviewer: It seems that I have done my homework and have some material! Did you sneak a peek at Angela’s blog? You just said that 1.8 optimizes hash functions. Are there any other optimizations in 1.8?
Angela: 1.8 also has three major optimizations:
- Array + list changed to array + list or red-black tree;
- The insertion method of the list is changed from the beginning to the end of the insertion method. In simple words, when inserting, if there are elements in the array position, 1.7 put the new element into the array, and the original node is the successor node of the new node. 1.8 Iterate the list and place the element at the end of the list;
- During capacity expansion, 1.7 needs to rehash the elements in the original array to the new array. 1.8 Uses simpler judgment logic, such as unchanged position or index + old capacity.
- During the insert, check whether expansion is required before the insert, and then check whether expansion is required after the insert.
Interviewer: You can tell me why you want to do each of these optimization;
Angela: [ahem, it’s a serial]
- To prevent hash conflicts, reduce the time complexity from O(n) to O(logn) because the list is too long.
- Because 1.7 head interpolation expansion, head interpolation will make the list inversion, multithreaded environment will produce rings; Thread A is inserting node B, and thread B is also inserting it. When the capacity is insufficient, it starts to expand, re-hash, place the element, and adopt the head-insert method. The traversed node B is put into the head, thus forming A ring, as shown in the figure below: void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry
,v>
e : table) { while(null ! = e) { Entry
,v>
next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; NewTable [I] = e; newTable[I] = e; e = next; }}} Copy the code
- Why can 1.8 directly locate the location of the original node in the new data without re-hashing during capacity expansion? The mask used to calculate the location of the array is simply one more in the high order. Before the capacity expansion, the length of binary n-1 is 16, and the binary n-1 used to calculate (n-1) & hash is 0000 1111. After the capacity expansion, the binary n-1 is 0001 1111. Since the operation is &, 1 and any number & are itself, there are two cases, as shown in the figure below: the case where the fourth bit of the original hashcode is 0 and the high bit is 1; If the highest value of the fourth digit is 0, the re-hash value remains the same. If the fourth digit is 1, the re-hash value is 16 larger than the original value.
Interviewer: Is the HashMap thread safe?
Angela: No, in A multi-threaded environment, there will be infinite loop, data loss, and data overwrite problems in 1.7, and there will be data overwrite problems in 1.8. Take 1.8 as an example, when A thread determines that the index position is empty, it will be suspended, and B thread begins to write node data to the index position, then A thread resumes the scene and executes the assignment operation. It overwrites the data from thread A; There is also a ++size place that causes multiple threads to expand at the same time.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; If ((p = TAB [I = (n-1) &hash]) == null) TAB [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; Resize () resize(); if (++size > threshold); afterNodeInsertion(evict); return null; } copy codeCopy the code
Interviewer: How do you usually solve the problem of thread insecurity?
Angela: in Java HashTable, Collections, synchronizedMap and ConcurrentHashMap can achieve thread-safety Map.
HashTable is directly on the operation method and synchronized keyword, lock the entire array, particle size is larger, the Collections. The synchronizedMap set of tools is the use of the Collections within class, A SynchronizedMap object is encapsulated by passing in a Map. An object lock is defined internally, which is implemented within a method. ConcurrentHashMap uses segmented locking, which reduces lock granularity and greatly improves concurrency.
Interviewer: Do you know how ConcurrentHashMap is implemented?
Angela: ConcurrentHashMap uses volatile variables to avoid reordering instructions and ensure memory visibility, and synchronizes with CAS. Multithreaded operations will only lock the node indexed by the current operation.
As shown in the figure below, thread A locks the linked list of node A, while thread B locks the linked list of node B, and the operations do not interfere with each other.
Interviewer: You mentioned earlier that the length of the list reaches a threshold when it turns into a red-black tree. What is the threshold?
Angela: The threshold is 8, and the red-black tree twist-list threshold is 6
Interviewer: why 8, not 16,32 or even 7? Why is the threshold of the red-black tree list 6, not 8?
Angela: because that’s how the author designed it. Oh, no, because after calculation, if the hash function is properly designed, the chance of 8 hash collisions is 6 in a million. Because eight is enough, and the reason why it turns back to six is because if the number of hash collisions hovers around eight, it’s always going to be a linked list and a red-black tree, just to prevent that from happening.
Interviewer: Are the internal nodes of a HashMap ordered?
Angela: It’s unordered, randomly inserted based on the hash value
Interviewer: Is there an organized Map?
Angela: LinkedHashMap and TreeMap
Interviewer: Tell me how to organize your LinkedHashMap.
Angela: The LinkedHashMap maintains a single linked list internally, with first and last nodes. In addition to inherits the Node properties of HashMap, the LinkedHashMap Node Entry has before and after to identify the front and back nodes. Sorting by insertion order or access order can be implemented.
/** * The head (eldest) of the doubly linked list.*/transient LinkedHashMap.Entry<K,V> head; /** * The tail (youngest) of the doubly linked list.*/transient LinkedHashMap.Entry<K,V> tail; Private void linkNodeLast(linkedhashMap. Entry<K,V> p) {linkedhashMap. Entry<K,V> last = tail; tail = p; if (last == null) head = p; else { p.before = last; last.after = p; Static class Entry<K,V> extends hashMap. Node<K,V> {Entry<K,V> before, after; Entry(int hash, K key, V value, Node<K,V> next) { super(hash, key, value, next); }} Copy the codeCopy the code
Example code:
public static void main(String[] args) { Map<String, String> map = new LinkedHashMap<String, String>(); Map. Put ("1", "Angela "); The map. The put (" 2 ", ""); Map. Put ("3", "blog "); for(Map.Entry<String,String> item: map.entrySet()){ System.out.println(item.getKey() + ":" + item.getValue()); }}//console output 1: Angela 2: 3: blog copy codeCopy the code
Interviewer: Tell me about how TreeMap gets organized.
Angela: Treemaps are sorted in the natural order of keys or Comprator, internally using a red-black tree. So either the class to which the key belongs implements the Comparable interface, or you can customize a Comparator that implements the Comparator interface and pass it to the TreeMap user for the comparison of the key.
Interviewer: As mentioned earlier, the reduction of lock granularity is achieved through the combination of CAS and synchronized. Can you tell me about the implementation of CAS and the implementation principle of synchronized?
Angela: Let’s reschedule for the next episode, OK?
Interviewer: Ok, go back and wait for the announcement.
The above interview questions, due to the reason of space, xiaobo and organized into documents and corresponding study notes, need to get friends can pay attention to the public number: Kirin to change the bug, discuss more Java interview skills and interview questions analysis.