preface
Make writing a habit together! This is the fourth day of my participation in the “Gold Digging Day New Plan ยท April More text Challenge”. Click here for more details.
I believe you have a feeling, we went to ten companies interview, eight will ask about the knowledge of HashMap, so we can see the importance of HashMap, so today I will talk to you about those things ๐.
The HashMap JDK8
Before we get to HashMap, let’s say a few words about the performance of common data structures when performing basic operations ๐
- Array: An array is a collection of data stored in contiguous storage units. For the search of the specified subscript, the time complexity is O(1); To search through the given value, the number group needs to be traversed and compared one by one, and the time complexity is O(N). For the ordered array, the binary method can be used to search, and the time complexity can be increased to O(logN). For ordinary insert and delete operations, which involve moving array elements, the average complexity is also O(N).
- Linked list: When adding and deleting linked lists, only references between nodes need to be processed, and the time complexity is O(1). When searching, it needs to traverse the whole linked list and compare data one by one, and the time complexity is O(N).
- Binary tree: For a balanced ordered binary tree, if the number of elements in the binary tree is N, log(N) circular calls are performed to insert, find, and delete nodes. Its time complexity is optimal compared with the above two data structures (array and linked list)
- HashMap (hash table) : Compared with the above data structures, adding, deleting, searching and other operations in the hash table has a higher performance. It only needs one positioning to complete, and the time complexity is O(1).
Through the comparison above, we can clearly see that the performance of HashMap is optimal. Why can it have such a high performance? Let’s look at the source code for how HashMap is implemented
The base element of the HashMap
In the source code of HashMap, we can see several basic elements, as shown below ๐
public class HashMap<K.V> extends AbstractMap<K.V>
implements Map<K.V>, Cloneable.Serializable {
private static final long serialVersionUID = 362498820763181265L;
/** * The default initial capacity - MUST be a power of two. */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two < = 1 < < 30. * /
static final int MAXIMUM_CAPACITY = 1 << 30;
/** * The load factor used when none specified in constructor. */
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
/** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */
static final int TREEIFY_THRESHOLD = 8;
/** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. */
static final int UNTREEIFY_THRESHOLD = 6;
/** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. */
static final int MIN_TREEIFY_CAPACITY = 64;
/** * omit N lines of code.... * /
}
Copy the code
Here is an introduction to the important elements ๐ :
DEFAULT_INITIAL_CAPACITY and MAXIMUM_CAPACITY: When we pass a new hashMap constructor with no arguments, it will automatically specify its default size as DEFAULT_INITIAL_CAPACITY, which is 16. We can also specify our own initial capacity. Note that The capacity must be 2^N^ (N is a positive integer) and less than MAXIMUM_CAPACITY, which is less than 2^30^. It is important to remember that the capacity of a HashMap is always an integer power of two. The initial capacity is 16. The capacity after each expansion is twice that of the previous one. For example, the current capacity is 16. If you expand the capacity once, it becomes 32. If you expand the capacity again, it becomes 64. And if we’re given an initial capacity that’s not a power of 2 in a new HashMap, we’ll call tableSizeFor() inside the constructor to convert the capacity to a power of 2. For example, passing 3 converts to 4 and passing 13 converts to 16.
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code
DEFAULT_LOAD_FACTOR: DEFAULT_LOAD_FACTOR is the default load factor. We can also specify the initial load capacity based on the parameter HashMap constructor. If not, the default load factor is 0.75.
TREEIFY_THRESHOLD: TREEIFY_THRESHOLD represents the treeization threshold. We all know that the underlying structure of HashMap uses the form of [array] + [linked list OR red-black tree] to store nodes. First, HashMap is an array, and each position in the array can put multiple elements. We can think of the location of the elements as a bucket. In order to maximize efficiency, the developers of HashMap have been quite clever in the design of the bucket. Buckets can be linked lists or red-black trees. At first, when there are not many elements in the bucket, we keep the elements in the linked list. As the HashMap becomes more and more elements, the bucket becomes more and more elements. When the number of elements exceeds 8 (TREEIFY_THRESHOLD) and the array length exceeds 64 (MIN_TREEIFY_CAPACITY), the list becomes a red-black tree (i.e., treification). If the red-black tree is used to store node elements, the red-black tree will be degraded to a linked list when the number of node elements is less than 6 (UNTREEIFY_THRESHOLD) as the number of nodes decreases (that is, the deletion operation is performed).
Why should a HashMap be converted to a red-black tree? Can’t we use balanced binary trees instead of red black trees? If the list is too long, it will not be efficient to find node elements, so we need to turn the list into a red-black tree. At the same time a HashMap source node insert removing efficiency, the great god incorporated the red-black tree is not the pursuit of “full balance” features, the previous insert or delete nodes inside the red-black tree when any imbalance will solve within three rotation, insertion or deletion and balanced binary tree node, in pursuit of the perfect balance when the number of rotation is not fixed, This results in low execution efficiency, which is why red-black trees are used instead of binary trees.
For more information about red black trees and balanced binary trees, you can go to Baidu or go to Big Smart to teach you Java — a brief analysis of red black trees.
Common methods of HashMap
Put () and get() are two of the most common methods used in using HashMap. Here’s a look at the source of these two methods ๐
Put () method
/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * *@param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @returnthe previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) * /
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
/**
1. Implements Map.put and related methods.
2. * @param hash hash for key
3. @param key the key
4. @param value the value to put
5. @param onlyIfAbsent if true, don't change existing value
6. @param evict if false, the table is in creation mode.
7. @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
// TAB [] is an array, p is the bucket where elements are stored
Node<K,V>[] tab; Node<K,V> p; int n, i;
// if TAB is empty, create an array and call resize() to expand the array
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// I = (n-1) &hash computes the subscript. If there is no collision, the node is directly stored
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
// If a collision occurs, it is stored in a linked list or tree
else {
Node<K,V> e; K k;
// If the hash is the same and the key is the same, then e points to the node and waits to be overwritten later
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// Check whether it is a red-black tree. If it is a tree node, use red-black tree to complete the insertion
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
// If it is a list, the list nodes outside the list group are iterated
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
// Determine the threshold. If the threshold is exceeded, the list will turn to red black tree
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// If the key is the same, replace it directly. In this case, e already points to a node, and exit the loop
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}// Check the null of e for overwriting and return the overwritten node directly
if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// Expand the array when the number of nodes in the array reaches the threshold
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
Resize (); resize(); resize(); The table mentioned here is Node
[] table in the source code. It is an internal class of HashMap and a basic child Node of HashMap. At the same time, it is not only a component element of the underlying array of HashMap, but also a component element of each one-way linked list. It contains the key and value required by the array elements, as well as the reference field to the next node required by the linked list. The source code is as follows: ๐
/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) */
transient Node<K,V>[] table;
/** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) */
static class Node<K.V> implements Map.Entry<K.V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
public final K getKey(a) { return key; }
public final V getValue(a) { return value; }
public final String toString(a) { return key + "=" + value; }
public final int hashCode(a) {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false; }}Copy the code
Resize (); putVal (); putVal (); Can’t you just initialize it when you create a HashMap? In fact, I am not quite clear about the specific reason. Personally, I think initialization is beneficial to saving resources when data is inserted into HashMap, just like X treasure. After opening x Treasure APP, not all goods are displayed, but product data is loaded while browsing the web.
Index (index); null (index); P = TAB [I = (n-1) &hash] = index = (n-1) &hash, where n represents the length of the newly created table array. Let’s see where hash comes from. From the source code we see a method like this:
/** 1. Computes key.hashCode() and spreads (XORs) higher bits of hash 2. to lower. Because the table uses power-of-two masking, sets of 3. hashes that vary only in bits above the current mask will 4. always collide. (Among known examples are sets of Float keys 5. holding consecutive whole numbers in small tables.) So we 6. apply a transform that spreads the impact of higher bits 7. downward. There is a tradeoff between speed, utility, and 8. quality of bit-spreading. Because many common sets of hashes 9. are already reasonably distributed (so don't benefit from 10. spreading), and because we use trees to handle large sets of 11. collisions in bins, we just XOR some shifted bits in the 12. cheapest possible way to reduce systematic lossage, as well as 13. to incorporate impact of the highest bits that would otherwise 14. never be used in index calculations because of table bounds. */
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
H = key.hashcode (); We then get the location of the element by using the 16 bits above or 16 bits below hashCode() [(h = k.hashcode ()) ^ (h >>> 16)]. This is the result of an optimization in JDK8, which optimizes the algorithm for high order operations (speed, efficiency, quality). It can calculate the location of the element by using the high or low 16 bits of hashCode(). It can also ensure that both high and low bits are involved in the Hash calculation without too much overhead.
The else branch contains a for loop (iterating through the list), which goes through the following steps ๐
- E = p ext and the following p = e are actually cycling through the linked list backwards. P is the head element of each bucket at the beginning, and then the reference field of P points to the empty node E. At this time, it is equivalent to assigning the next element of P to E, that is, E has become the next element of P.
- If ((e = p. Next) == null) and (e = p
if (e.hash == hash &&((k = e.key) == key || (key ! = null && key.equals(k)))). The former means that if e, p.ext, is null, then the current p is already the last element in the list. In this case, a new element p.ext = newNode(hash, key, value, null) is added by tail interpolation, that is, the reference field of P directly points to the newly added element. If the length of the list exceeds treeify_threshold-1 (that is, 8) after the new element is added, call treeifyBin(TAB, hash) to convert the list to a red-black tree and continue. The latter means that if the key value is found to be repeated, the loop is broken to end the loop. 3. Finally, assign e to P again, by which time P has moved back one bit, and repeat the process until either the list is complete or a break breaks out.
โฃ If the maximum limit is exceeded, perform capacity expansion operation: About capacity expansion here we say two more sentences, first look at the source ๐
/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
// Create an oldTab array to hold the previous array
Node<K,V>[] oldTab = table;
// Get the length of the original array
int oldCap = (oldTab == null)?0 : oldTab.length;
// Threshold for array expansion
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
// If the original array length is greater than the maximum (2^30)
if (oldCap >= MAXIMUM_CAPACITY) {
// The capacity expansion threshold is increased to + infinity
threshold = Integer.MAX_VALUE;
// Return the original array
return oldTab;
}
/ / else if (newCap) (new array length by 2) maximum < 2 ^ (30) && (the original array length) > = initial length (2 ^ 4))
// The new array is empty, and the old array is empty.
// We expand by 2^1 units. As with newThr below, it expands by 2^1 from the original threshold
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0)
// The initial capacity of the new array is set to the threshold of the expansion of the old array
newCap = oldThr;
// Otherwise oldThr == 0, zero initial threshold means the default value is used
else {
// The initial size of the new array is set to the default value
newCap = DEFAULT_INITIAL_CAPACITY;
// Calculate the threshold for the default capacity
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// If newThr == 0, else if (oldThr > 0), newCap = oldThr;
if (newThr == 0) {
//ft is a temporary variable used to determine the validity of the threshold
float ft = (float)newCap * loadFactor;
// Calculate the new threshold
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
// Change the threshold value to the new threshold
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
// Change the global variable of table to newTable after expansion
table = newTab;
if(oldTab ! =null) {
// Move the old array (or bucket) to the new array (new bucket)
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
// Create a Node
object to traverse the array
,v>
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// place e (oldTab[j]) in newTab where e.hash (newcap-1) is.
if (e.next == null)
// This is a module fetch operation
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else {
// List rearrangement is one of the most difficult optimizations in JDK8
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
Here we talk about linked list rearrangement: LoHead = null; loTail = null; hiHead = null; hiTail = null; These two sets of objects are designed to do different things for (E.hash & oldCap) == 0. If (e.hash & oldCap) == 0, newTab[j] = loHead = e = oldTab[j], that is, the index position is not changed. Conversely (E. hash & oldCap)! = 0, newTab[j + oldCap] = hiHead = e = oldTab[j], that is, the bucket in position [j] is moved to position [j+ oldTab].
Here is a partial explanation from meituan-Dianping technical team’s article “Java 8 Series Rethinking HashMap” :
The shift operation we’ve been talking about is that a % (2^n) is equivalent to a & (2^ n-1), which is a bit operation and a modulo operation, and bitwise operation is more efficient than modulo operation, which is why the array length in HashMap is required to be 2^n. As a review, the following table operation of the HanshMap elements into the array is **index = hash & (n-1) **, where n is the integer power of the array length 2.
In the figure above, n represents an array of length 16, and n-1 is 15, converted to binary bit 1111. At this point, there are two different hash codes to operate with him (the corresponding position is 1, the result is 1, otherwise is 0), and the lower four bits of both hash codes are the same, the biggest difference is the fifth bit, key1 is 0, key2 is 1; At this time, we carry out expansion operation, n changes from 16 to 32, n-1=32, which is converted to binary bit 11111. At this time, we carry out and operation with key1 and key2. We find that due to the difference of the 5th bit, we get two different results ๐
As you can see, the results correspond exactly to the two cases we have seen above. This is an improvement to the expansion algorithm in JDK8. This is because it is considered random whether the 1bit added to hashCode is 0 or 1, so the resize process evenly distributes the previously conflicting nodes into the new bucket.
If ((e.hash & oldCap) == 0); if (e.hash & oldCap) == 0); We can see that if we don’t subtract 1, 16 is 10000, which is 0 with key1 (the fifth digit is 0), and 16 with key2 (the fifth digit is 1 above), which also fits both cases. The same is true after expansion.
The get () method
The get() method in a HashMap is simpler by checking the node at the subscript position and retrieving it in that node next. Source code: ๐
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}. (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
/**
* Implements Map.get and related methods.
* * @param hash hash for key
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
return first;
if((e = first.next) ! =null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
} while((e = e.next) ! =null); }}return null;
}
Copy the code
summary
Finally, a quick summary of HashMap:
- The initial capacity of a HashMap is 16
- The capacity of a HashMap is always an integer power of 2. Capacity after expansion = capacity before expansion *2
- When the number of elements in the HashMap > capacity * loadFactor (default: 0.75)
- When the number of elements in the bucket >= 8(TREEIFY_THRESHOLD) and the capacity of the HashMap >= 64(MIN_TREEIFY_CAPACITY), the list becomes a red-black tree
- When the number of elements in a red-black tree <= 6 (UNTREEIFY_THRESHOLD), the red-black tree is converted to a linked list
- HashMap is thread-unsafe
My experience is limited, some places may not be particularly in place, if you think of any questions when reading, welcome to leave a message in the comments section, we will discuss one by one ๐
Please take a thumbs up and a follow (โฟโกโฟโก) for this article ~ a crab (โ’โก’โ)
If there are mistakes in the article, welcome to comment correction; If you have a better, more unique understanding, you are welcome to leave your valuable ideas in the comments area.
Love what you love, do what you do, listen to your heart and ask nothing