HashMap implementation principle and source detailed analysis
The original link
Ps: This blog is based on Jdk1.8
Learning Points:
- 1. Know the data structure of HashMap
- 2. Know the hash algorithm in HashMap
- 3. Know the code implementation of PUT, remove and GET in HashMap
- 4. What are the hash conflicts of HashMap? How did you handle it?
- 5. Know the expansion mechanism of HashMap
1. What is HashMap?
HashMap is implemented as a Map interface based on a hash table in the form of key-value storage. The implementation of HashMap is not synchronous, which means it is not thread-safe. Both key and value can be null, and the mapping in a HashMap is not ordered.
2. Features of HashMap
- Hash stores unordered
- Both keys and values can store NULL values, but keys can only store a unique NULL value
- The data structure before JDk8 is array + list, after JDk8 is array + list + red-black tree
- If the threshold is greater than 8 and the array length is greater than 64, it becomes a red-black tree
3. Data structure of HashMap
In JDK7 case, array + link, hash conflict, convert to linked list:
< span style = “max-width: 100%; clear: both; min-height: 1pt;
HashMap node information:
static class Node<K.V> implements Map.Entry<K.V> {
// hashCode
final int hash;
final K key;
V value;
// The next pointer to the list is not null
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
// ...
}
Copy the code
4. Initialize the HashMap
4.1. Member Variables
public class HashMap<K.V> extends AbstractMap<K.V>
implements Map<K.V>, Cloneable.Serializable {
/** * Serial number version */
private static final long serialVersionUID = 362498820763181265L;
/** * Initializes the capacity, 16=2 to the fourth power */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/** * Maximum capacity, 2 to the 30th power */
static final int MAXIMUM_CAPACITY = 1 << 30;
/** * Default load factor. The default value is 0.75 */
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
/** * The list node tree becomes a red-black tree when it exceeds 8
static final int TREEIFY_THRESHOLD = 8;
/** * if the number of nodes in the red-black tree is less than 6, the list is converted back to */
static final int UNTREEIFY_THRESHOLD = 6;
/** * the minimum length of the array in the bucket for converting the structure into a red-black tree */
static final int MIN_TREEIFY_CAPACITY = 64;
// ...
/** * HashMap stores an array of elements */
transient Node<K,V>[] table;
/** * is used to store cache */
transient Set<Map.Entry<K,V>> entrySet;
/** * The number of elements in the HashMap */
transient int size;
/** * is used to record the number of HashMap changes */
transient int modCount;
/** * Used to resize the value of the next capacity (capacity * load factor) */
int threshold;
/** * Load factor of the Hash table */
final float loadFactor;
}
Copy the code
4.2. Construction method
public HashMap(int initialCapacity, float loadFactor) {
The initial capacity cannot be less than 0. If the initial capacity is less than 0, IllegalArgumentException is thrown
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
// If the initial capacity is greater than the maximum capacity, the maximum capacity is used as the initial capacity
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
// The load factor cannot be less than 0, and if it is a numeric type, isNan:true, it is non-numeric
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Assigns the specified load factor to the global variable
this.loadFactor = loadFactor;
// threshold = (capacity) * (load factor)
this.threshold = tableSizeFor(initialCapacity);
}
public HashMap(int initialCapacity) {
// Initialize the capacity and default load factor
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
public HashMap(a) {
// The default load factor is 0.75
this.loadFactor = DEFAULT_LOAD_FACTOR;
}
Copy the code
Then, we know that the default size of HashMap is 16, and where is it assigned? This. threshold = tableSizeFor(initialCapacity);
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code
This involves the basic knowledge of the computer, right shift and or operations, the following legend: through a more difficult calculation, n is 16Scroll through the code and find the following constructorpublic HashMap(Map<? extends K, ? extends V> m)
This constructor is used to construct a new HashMap that has the same mapping as the specified Map:
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
Copy the code
Look at the putMapEntries method:
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
// The length of the collection passed in
int s = m.size();
if (s > 0) {
// Check whether the table has been initialized
if (table == null) { // pre-size is not initialized
// The purpose of adding 1.0f is to round up the decimal to ensure maximum capacity and reduce the number of calls to resize
float ft = ((float)s / loadFactor) + 1.0 F;
int t = ((ft < (float)MAXIMUM_CAPACITY) ?
(int)ft : MAXIMUM_CAPACITY);
// If t is greater than the threshold of HashMap, tableSizeFor
if (t > threshold)
threshold = tableSizeFor(t);
}
else if (s > threshold) // If it is already initialized, expand resize
resize();
// Iterate to add all elements of the map to the hashMap
for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
K key = e.getKey();
V value = e.getValue();
putVal(hash(key), key, value, false, evict); }}}Copy the code
5, Jdk8 HashMap algorithm
5.1 Hash algorithm in HashMap
In the java.util.hashmap #hash method, there is a specific method used to calculate the hash value: what does this method do? Hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash (n-1) hash
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
It looks like the code is only two lines long, but it actually contains the idea of a hash algorithm. Here is a brief analysis:
static final int hash(Object key) {
// The key passed in is null, which returns the default value 0
if (key == null) return 0;
// Compute the hash code
int h = key.hashCode();
// Shift the calculated hashCode 16 bits to the right, equivalent to multiplying by (1/2) to the 16th power
int t = h >>> 16;
// Perform xor on both values and return
return h ^ t;
}
Copy the code
So what you’re going to do is you’re going to compute the hashCode, and then you’re going to shift the hashCode 16 bits to the right, and then you’re going to do xor to those two numbers. That seems to be the case. Then what is the author’s intention? First of all, since it’s a hash algorithm, the purpose of a hash algorithm is to evenly distribute the data
Can be seen from the figure, the use of exclusive or operation, in the probability of 0 s and 1 s are equal, so that’s why use exclusive or operation, the essence of the hash algorithm to make uniform distribution data, using the hash value of an exclusive or operation because of relatively uniform hash distribution, so a hash conflict probability is small lot
Supplement:
- And operation: the corresponding digits of two numbers are 1, after the operation is 1, the other cases are 0;
- Or operation: if either of the corresponding bits of two numbers is 1, or if the operation is followed by 1, otherwise, 0;
- Xor operation: when two numbers have the same corresponding bits, the result is 1, otherwise 0;
And then why do I move 16 to the right? We know that the largest value of an int is 2 to the 32nd, and then we can divide it into 16 bits higher and 16 bits lower. If we move it 16 bits right, the value becomes smaller
5.2. What is hash conflict in HashMap?
Hash collision is also known as hash collision. In theory, hash collision refers to the same calculated hash value, but in the HashMap, hash collision refers to (n-1)&hash, which is the index of the array in the HashMap. Before Jdk8, we used a linked list. Whenever a hash conflict occurs, we add a node to the end of the list. After jdk8, the method is to use the linked list + red black tree method, the first hash conflict, also use the linked list, then the list reaches 8 nodes, the array length exceeds 64 cases, into red black tree, this can be found in the source code
The inside of the double source, HashMap# putVal, logic, check calculated first, and array subscript TAB, I = (n – 1) & hash whether conflict, new node without conflict, conflict, list or a red-black tree
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
Copy the code
Put operation of HashMap in Jdk8
- The core flow of the PUT method
- Calculates the index of the array according to HashCode
- If the subscript array is empty, a new node is added
- Otherwise, it’s a hash conflict, so if the bucket uses a list node, it goes to the end of the list, and if it uses a red-black tree it goes to the red-black tree
The above is the core process. If duplicate keys are ignored, a new value value will be replaced for the key. If size is greater than threshold, capacity expansion will be carried out
Ok, or put source code:
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// The initial data is resize
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// Determine whether a hash conflict has occurred
if ((p = tab[i = (n - 1) & hash]) == null)
// Hash does not conflict, new node is added
tab[i] = newNode(hash, key, value, null);
else { // Hash conflicts are handled using linked lists or red-black trees
Node<K,V> e; K k;
// If there are duplicate keys, both key and hash are equal
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
// Assign the old node object to the new e
e = p;
else if (p instanceof TreeNode) // Use a red-black tree node
// Put the node in the red-black tree
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else { // List
// Infinite loop
for (int binCount = 0; ; ++binCount) {
// Go all the way to the last node
if ((e = p.next) == null) {
// Add the new node to the tail
p.next = newNode(hash, key, value, null);
// If the number of nodes is greater than 8, it becomes a red-black tree
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
// Break the loop
break;
}
// Also to avoid the hashCode and key situation
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break;
// reassign for list traversalp = e; }}// If a duplicate key is found, replace the old value with the new value
if(e ! =null) { // existing mapping for key
// Record e
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
// Replace the old value with the new value
e.value = value;
// Callback after access
afterNodeAccess(e);
// Return the old value
returnoldValue; }}// Record the number of changes
++modCount;
// size is larger than threshole
if (++size > threshold)
resize();
// Callback method
afterNodeInsertion(evict);
return null;
}
Copy the code
And then how do you convert to a red black tree? Red black tree knowledge is relatively complex, so more detailed can refer to my previous blog Java handwriting implementation of red black tree:
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
// The MIN_TREEIFY_CAPACITY value is 64, which means that an array size smaller than 64 will not be converted to a red-black tree
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
// Capacity expansion method
resize();
// Go to red black tree
else if ((e = tab[index = (n - 1) & hash]) ! =null) {
// The head and tail of the red-black tree are hd and T1
TreeNode<K,V> hd = null, tl = null;
do {
// Build the tree node
TreeNode<K,V> p = replacementTreeNode(e, null);
if (tl == null)
// Assign the new node p to the head node of the red-black tree
hd = p;
else {
// The front node of the new node is the old tail node T1
p.prev = tl;
// The next node of t1 is the new node
tl.next = p;
}
tl = p;
} while((e = e.next) ! =null);
// make the node of the array execute the new TreeNode, which then becomes a TreeNode
if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code
7. Capacity expansion mechanism of HashMap
This is one of the key points in HashMap, and one of the more difficult questions
7.1 When is capacity expansion required?
When the number of elements in a hashMap exceeds threshold, threshold is the array length multiplied by loadFactor, which is 0.75 F by default
7.2. What is HashMap expansion?
The resize method is a HashMap expansion method, which is time-consuming. When HashMap is expanded, it doubles the capacity of 16 to 32. HashMap (n-1) + hash (n-1) + hash (n-1) + hash (n-1) + hash (n-1) After the expansion, the nodes will either be left in their original locations, which may sound confusing, so take a closer look at the following analysis:
For example, if the capacity is expanded from 16 to 32, the diagram shows:3. To expand or expand twice as much as before:At this point, subscript(n-1) & hash
, data after capacity expansion10101
And the original00101
It’s actually 1bit more than that,10101
It’s 21 in decimal, and21 = 5 + 16
, it is"Original location + Old capacity"
And there’s another case where it stays zero, where it doesn’t change position
A table is given below, and the data is shown in the figure:The capacity is 16There are two low Pointers loHead and lloTail, and two high Pointers hiHead and hiTailAfter expanding to 32, add two more lists to their respective locations. There are two cases, the sum of the original position"Original location + Old capacity"
This positionTherefore, in the process of expansion, the corresponding node position change is as follows:
7.3, resize source code implementation
After the above detailed analysis, this implementation logic can be found in the code corresponding, OK, with the corresponding source code:
final Node<K,V>[] resize() {
// Get the current node array
Node<K,V>[] oldTab = table;
// The length of the array
int oldCap = (oldTab == null)?0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
// Calculate the size after expansion
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) { // The maximum capacity is 1 <<< 30
// Set the threshold to the maximum capacity
threshold = Integer.MAX_VALUE;
return oldTab;
}
// If it is not exceeded, it is doubled
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
// Assign the old threshold to the new array length
newCap = oldThr;
else { // zero initial threshold signifies using defaults
// Use the default value 16
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// Recalculate the threshold and assign it to threshold
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
// The new threshold, which was 12 by default, is now 24
threshold = newThr;
// Create a new node, newCap is the new array length, 32
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if(oldTab ! =null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
// is a red-black tree node, call split method
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // Preserve Order is the case with linked lists
// Define the relevant pointer
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
// No need to move position
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else { // Need to move the position to "old position + old capacity" position
if (hiTail == null)
hiHead = e;
else
// hiTail points to the node e to movehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
if(loTail ! =null) {
loTail.next = null;
// The position is unchanged
newTab[j] = loHead;
}
if(hiTail ! =null) {
// hiTail points to null
hiTail.next = null;
// oldCap is old capacity, move to "old + old capacity" position
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
Remove HashMap in Jdk8
Remove method, the idea here is to find the location of the element, if it’s a list, you can just go through the list of remove elements, if it’s a red-black tree, you can go through the red-black tree and find the node, and then remove the node, if the number of nodes in the tree is less than 6, in this case you have to go through the list
@Override
public boolean remove(Object key, Object value) {
return removeNode(hash(key), key, value, true.true) != null;
}
final Node<K,V> removeNode(int hash, Object key, Object value,
boolean matchValue, boolean movable) {
Node<K,V>[] tab; Node<K,V> p; int n, index;
// Array subscripts are (n-1)&hash, where the elements can be found
if((tab = table) ! =null && (n = tab.length) > 0 &&
(p = tab[index = (n - 1) & hash]) ! =null) {
Node<K,V> node = null, e; K k; V v;
// The node on the bucket is the key
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
// Point Node to the Node
node = p;
else if((e = p.next) ! =null) { // Lists or red-black tree nodes
if (p instanceof TreeNode)
// Find the red-black tree node
node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
else { // List
// Walk through the list to find the desired node
do {
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
node = e;
break;
}
p = e;
} while((e = e.next) ! =null); }}// After the node is found
if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
if (node instanceof TreeNode)
// Red black tree remove node
((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
else if (node == p)
// list remove by changing the pointer
tab[index] = node.next;
else
p.next = node.next;
// Record the number of changes
++modCount;
// The number of changes
--size;
afterNodeRemoval(node);
returnnode; }}return null;
}
Copy the code
Get operation for HashMap in Jdk8
Get method: Find value by key. This method is easier to understand
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// If the hash table is not empty and the bucket corresponding to the key is not empty
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
// Check the first node against the position of the index
if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
return first;
if((e = first.next) ! =null) { // If it is not the first node, it could be a linked list or a red-black tree node
if (first instanceof TreeNode)
// Obtain the red-black tree node according to getTreeNode
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
// In the case of linked lists, only the list can be traversed
do {
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
} while((e = e.next) ! =null); }}return null;
}
Copy the code
10, HashMap related interview questions
- What is the data structure of a HashMap?
Before jdK8, it is array + list + red-black tree
- How is the hash function implemented in HashMap?
Get the hashCode using the JDK’s hashCode() method, move it 16 bits right, and then xor the two numbers
- What are hash collisions in a HashMap?
Hash collision, also known as hash collision, is usually the same as the calculated hash value. In HashMap, it is based on the calculated hash value, and then the array table subscript (n-1) is the same as the hash value
- How does HashMap handle hash collisions?
Before jdK8, the linked list method was used. After JDK8, the linked list + red-black tree method was used
- Is HashMap thread safe?
HashMap is not thread-safe because there are no synchronization locks or other thread-safe operations in the source code
- HashMap is not thread-safe, and then what are the methods?
ConcurrentHashMap can be used
- How does ConcurrentHashMap keep threads safe?
ConcurrentHashMap in Jdk8 uses CAS plus synchronized locks to ensure thread safety