Understand HashMap in depth
What is a HashMap
Map is a Map structure that allocates storage based on hash values. A Map is a collection class used to store key-value pairs
Why do we need to understand the underlying principles of HashMap
- HashMap is a must-have interview question
- Hashmaps are also one of the most commonly used collections in your work, so it’s important to understand the underlying principles so that you can quickly locate problems when they occur
- There are a lot of knowledge points involved in HashMap, which can be used to comprehensively investigate the basic skills of the interviewer. It is an unpassable hurdle to get a good offer. Next, I will use the most easy-to-understand language to reveal the mystery of HashMap
What are the commonly used Map sets
HashMap, TreeMap, LinkedMap, etc., are commonly used. Here we will focus on HashMap, because HashMap is the most commonly used, and interview is also the most frequently asked
HashMap underlying data structure
Jdk8 implements a HashMap optimization that converts the list to a red-black tree when the array length and list length reach thresholds. This further optimizes the query time
HashMap uses an array + linked list + red-black tree structure
HashMap source code analysis
Where should we start to understand HashMap?
Start with the class header of the HashMap
AbstractMap AbstractMap implements some basic methods of Map
// Implement Map interface, some methods need to subclass according to their own characteristics to implement
// Implement Cloneable, Serializable, can implement clone and serialization
public class HashMap<K.V> extends AbstractMap<K.V>
implements Map<K.V>, Cloneable.Serializable {
// Serialize the version number
private static final long serialVersionUID = 362498820763181265L;
// Initialize the size by default
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// Maximum initial capacity
static final int MAXIMUM_CAPACITY = 1 << 30;
// Default load factor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// Set the threshold for converting lists to red-black trees (list length reaches 8)
static final int TREEIFY_THRESHOLD = 8;
// Red black tree degenerates to the list threshold (number of nodes is 6)
static final int UNTREEIFY_THRESHOLD = 6;
// Minimizes the size of the array converted to a red-black tree
static final int MIN_TREEIFY_CAPACITY = 64;
// Store the list array
transient Node<K,V>[] table;
// Can be used to obtain key and value
transient Set<Map.Entry<K,V>> entrySet;
// The number of key-value mappings
transient int size;
// The number of times the hashMap has been modified. This parameter is used to determine quick failures
transient int modCount;
// Expand the key logarithm
int threshold;
// Load factor
final float loadFactor;
}
Copy the code
In fact, observing the member attributes above, it answers several interview questions that are particularly frequently asked
-
What are the conditions for a linked list in a HashMap to become a red-black tree?
The length of the list is 8 and the length of the array in the HashMap is 64 before the red-black tree conversion takes place, otherwise only the array expansion takes place
-
Member attribute table by
transient
The modifier does not participate in serialization but it can actually be serialized. Why do you do that?The designers of the HashMap did not want the entire array to be serialized, but instead serialized its contents as they were serialized
Next, let’s examine the constructor of the HashMap
// There are many overloaded constructors for HashMap, but the most complete and basic is this one
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
// If initialCapacity passed in is not an integer multiple of 2, convert to an integer multiple of 2, as explained later
this.threshold = tableSizeFor(initialCapacity);
}
Copy the code
The put() method is now unmasked!
// The most commonly used put method, first we need to know that a hashMap uses the hash value of the key to find a certain position in the array
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
// If the key is not empty, get the result by xor the key's hashcode() value 16 bits higher and 16 bits lower
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
// The implementation of the put method
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// The table array is not initialized yet
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// Use the hash value of the previous step and the array length -1 to get the location of the array
// If the array position is null, build it into a Node.
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
// If there is a value in the array location
else {
Node<K,V> e; K k;
// If the first nodes in the array have the same key value, overwrite
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// If the node is not the same as the head node, then check whether the node is a red black tree, if it is a red black tree to insert
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// Otherwise, it is a list node
else {
// Iterate over each node in the list in turn
for (int binCount = 0; ; ++binCount) {
// If the current node's next null, the new data is inserted directly to the tail
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// If a node key is the same as the key to be inserted during traversal, break the node
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}// The array is overwritten
if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
returnoldValue; }}// Number of changes +1
++modCount;
// If the current key-value log is greater than the capacity expansion threshold, the capacity expansion is performed
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
Let’s look at the array expansion process
// Capacity expansion method
final Node<K,V>[] resize() {
/ / the old array
Node<K,V>[] oldTab = table;
// The capacity of the old array
int oldCap = (oldTab == null)?0 : oldTab.length;
// The expansion threshold of the old array
int oldThr = threshold;
// Used to store the capacity of the expanded array and its expansion threshold
int newCap, newThr = 0;
// If the size of the old array is greater than 0, it is not an array initialization operation, but an array expansion triggered by the logarithm of the key value
if (oldCap > 0) {
// If the size of the old array is already larger than the maximum size, the expansion threshold is directly assigned to the maximum Integer value and then returned
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
// If the size of the old array is moved to the left by 1 bit, the size of the old array is moved to the left by 1 bit
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
// The expansion threshold is also moved 1 bit to the left
newThr = oldThr << 1; // double threshold
}
// If the capacity of the old array is 0, we are initializing the array. The new capacity is the value of the old expansion threshold, which is the value we passed in
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
// If no value is specified in the constructor, the default 16 is used
else { // zero initial threshold signifies using defaults
// The new array size is 16 by default
newCap = DEFAULT_INITIAL_CAPACITY;
// The capacity expansion threshold is load factor * default capacity
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// If the array size is customized, but the expansion threshold is not specified, the new expansion threshold is constructed according to the above
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
// Create an array based on the size of the new array
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
// If the array is not null
if(oldTab ! =null) {
// Loop through each position of the old array
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
// The head node is not null
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// If the next node of the head node is null
if (e.next == null)
// Rehash the node to select the position and assign the value
newTab[e.hash & (newCap - 1)] = e;
// If the current node is a red-black tree node
else if (e instanceof TreeNode)
// Split the red-black tree into upper and lower trees
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
// Nodes are linked list nodes
else { // preserve order
// Build a high and low chain
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// Put the low chain in the current position
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
// Put the high chain in the current position + expanded value
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
Here is an answer to the above analysis of the source code did not specifically expand the question
-
The hash value is xOR obtained by the key’s hashcode() value 16 bits higher and 16 bits lower
In order to reduce collisions and further reduce the probability of hash collisions, moving 16 bits to the right can retain the feature that the higher 16 bits are in the lower 16 bits. If it is ampersand or |, certain features will be lost, which can reduce collisions more
-
Why does an integer multiple that is not 2 automatically become a multiple of 2 when passed in
Because when the length of the array is a multiple of 2, when calculating the position of the key in the array (p = TAB [I = (n-1) & hash) where n-1 is represented in binary its mantissa is 1
For example, if the array length is 16 and n-1 is 15, the binary representation is 0000 1111
The hash value is determined by the hash value when ampersand is applied to the hash value
For example, if there are two key-value pairs to be inserted, the hash value of A is 0000, 1001 the hash value of B is: 0000 1101, so that when the hash value is set to n-1&, their results are different, but if the mantissa of n-1 is not 1, such as 0000 1001, then the ampersand of A and B to n-1 will produce the same result, and the collision will occur
Why is & adopted? If * is used to find the location in key and use % in capacity expansion, is it ok?
Yes, of course, but in order to improve performance, so the use of bit operations, which can bring performance improvement
-
Why did the expansion move to the left instead of *2
Because bit operations are faster, it improves performance
disadvantages
The basic principles of HashMap are described above. Of course, we all know that HashMap is not thread safe, so what should we do when there is a concurrent scenario? Next time we will reveal the ConcurrentHashMap!
conclusion
So far we have understood the underlying structure of HashMap, the specific implementation of the PUT method and the author’s optimization points in some implementations. We can also learn the author’s code ideas in some places
I actually know the source will be very boring, very boring, but I feel or should stick to it, so can do much, know the why, the ability to see the source code will help improve your ability, when you do develop involuntary think of some ideas of the author, so as to enhance the capacity of your code I hope every developer can stick, together come on!
Finally, I would like to end this article with a sentence, work hard to refueling, after many years, you will thank yourself for having worked so hard!
I am he who loves writing code. See you next time!