This post was originally posted on my blog
preface
A HashMap uses a storage structure of key-value pairs. The bottom layer of the system is realized by array, chain address hashing and red-black tree. The HashMap source for this article is based on JDK1.8.
The source code parsing
variable
Buckets, which are often mentioned below, are actually any location in the table array
It doesn’t matter if you don’t understand the meaning of variables here, as you will see later in the method call
// Table initial capacity 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
// Maximum capacity
static final int MAXIMUM_CAPACITY = 1 << 30;
LoadFactor =0.75f if the constructor does not provide a loadFactor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
//table >8 nodes in a bucket to a red-black tree
static final int TREEIFY_THRESHOLD = 8;
// Restore the tree to a linked list when <=6 nodes in the bucket
static final int UNTREEIFY_THRESHOLD = 6;
// The table should be treed only when the table length exceeds 64. Otherwise, call resize() to expand the table even if the number of nodes in a bucket is greater than or equal to 8
static final int MIN_TREEIFY_CAPACITY = 64;
// The underlying array
transient Node<K,V>[] table;
/ / entrySet cache
transient Set<Map.Entry<K,V>> entrySet;
// The number of key-value pairs
transient int size;
// Total number of adjustments
transient int modCount;
// If the threshold exceeds this value, you need to expand capacity * load_factor
int threshold;
// Load factor
final float loadFactor;
Copy the code
A constructor
There are four constructors in total
//1
public HashMap(a) {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
// the constructor for the custom initial capacity calls constructor 3 with the default load factor
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
//3. Customize the initial capacity and load factor
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
// The threshold is used as the table capacity
this.threshold = tableSizeFor(initialCapacity);
}
// build a HashMap based on a map's key-value pairs
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
Copy the code
TableSizeFor (int cap) method
In constructor 3, threshold is computed by tableSizeFor. Take a look at this method
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code
The function of this method is to find a minimum k such that 2^k>=cap. This operation causes that tableSizeFor will eventually process whatever initialCapacity is passed in the constructor into 2^ K form. This operation will be explained later.
Just a quick explanation of why tableSizeFor works. If cap=6, then n=6-1=5. As the binary n 101, after the unsigned right shift into 010, 101 | 010 = 111. The new n is 111, so every time you shift it, because you’re doing or, you’re going to get 111. The final return is n+1, which corresponds to the decimal 8, which is 2^3.
As can be seen, each shift or operation doubles the number of consecutive ones starting from the high binary of n. Since constructor 3 limits the maximum initialCapacity to MAXIMUM_CAPACITY=1 << 30, in the most extreme case, the final binary of N corresponds to 30 ones. And then finally, plus 1, it’s exactly 2 to the 30th.
So why do we set n=cap-1 when we take n in the first step? If we start with cap of 2 to the k, then if we don’t subtract 1 at the beginning, we end up with n equal to 2 to the k plus 1, which is obviously not what we want. Therefore, cap is first reduced by 1, and the binary number corresponding to N is k ones, which is not affected by shift or operation. And then finally, when I return, n plus 1 goes back to 2 to the k.
Put (K key, V value) method
The method of storing key-value pairs is actually implemented through the putVal method
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
Copy the code
Hash (Object key) method
Calculate the hash value of key, because key.hashcode is an int value, (h = key.hashcode ()) ^ (h >>> 16) equals xor of the high and low 16 bits, making the distribution more loose. The specific reason can see zhihu’s highly praised answer.
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
PutVal method
Before introducing putVal, let’s take a look at a diagram to get a feel for how hashMap stores data. It can be seen that when the number of nodes in a bucket in the table exceeds the critical value, the original linked list will be transformed into a red-black tree. Another prerequisite for this transformation is that the table has a length greater than or equal to 64, which is not shown in the figure for convenience.
Methods the process
- If the table is empty at first, call the expansion method to initialize the table
- If the bucket to be inserted is empty, insert it directly
- If the bucket already has elements and the first element has the same hash value as the element to be inserted, go to subsequent processing; otherwise, go to 4
- Hash conflicts are handled according to red-black tree logic if tree nodes are used. If it is an ordinary node, the list is traversed to see if there is a key that is the same as the element to be inserted. If the key is found, skip the loop and go to 5, otherwise go to 6
- Check whether the oldValue needs to be updated according to onlyIfAbsent. The method ends after return oldValue
- Insert new nodes at the end of the linked list, and judge whether the number of nodes reaches the critical value, then tree. Go to the seven
- Add 1 to the total number of nodes and determine whether to expand
Let’s look at the code
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// If the table has not been initialized, call resize to initialize it
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// If the current hash position is empty, just put the key-value pair in it
//(n-1) &hash is equivalent to hash % n, for reasons explained later
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
// If the location already has elements, use chain hash
else {
Node<K,V> e; K k;
// If the key of the first node p is the same as the key to be inserted
// Save the value of p and update the value directly
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// If p is a tree node, insert according to tree logic
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
// Otherwise, find the insertion position by traversing the list corresponding to the current position
for (int binCount = 0; ; ++binCount) {
// If no node with the same key value is found at the end of the list, the new node is placed at the end of the list
if ((e = p.next) == null) {
// Save to the end
p.next = newNode(hash, key, value, null);
// If the length of the list exceeds the threshold, tree the list
//bincount shortens the first node, so we need to subtract 1
// Note that bincount does not count newNode
// So there are actually 9 nodes in the tree
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// Find the same element in the list as the key to be inserted
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}// if e is not null, the linked list has the same key value as the key to be inserted
if(e ! =null) { // existing mapping for key
V oldValue = e.value;
//onlyIfAbsent The default false indicates that oldValue is updated with a new value when the key to be inserted already exists
// Otherwise keep the original key-value unchanged
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
// In hashMap, this method is only used for empty LinkedHashMap
afterNodeAccess(e);
// Return the old value
returnoldValue; }}// If the map does not have the same key as the element to be inserted, increment the number of changes by 1
++modCount;
// The number of nodes is increased by 1 if the threshold is exceeded
if (++size > threshold)
resize();
// This method is also used by LinkedHashMap
afterNodeInsertion(evict);
return null;
}
Copy the code
Now explain the problems mentioned in the above notes
N minus 1 and hash
First, hash is an int range, which in Java is -2147483648 to 2147483647. If all hashes were indexed to the table array, the table. Length would be over 4 billion! This is clearly not scientific. Therefore, we can use hash % n as the index value, which makes the range much smaller.
On a computer, ampersand is faster than %. Also, (n-1) &hash is hash % n only if the length of the table is 2 to the k power. Why is that? First of all, when n = 2^k, n-1 corresponds to k ones in base 2, so (n-1) &hash takes the value of the last K bit of the hash, because in the binary of the hash, the value before the last k bit must be a multiple of n, so it must be 0. So the last k bits are the result of hash % n. This also explains why the length of the table has to be 2 to the k.
The following is a schematic diagram, taking n=8 and hash=75 as an example
Resize () Expansion method
Used to initialize a table or double the capacity of a table
Methods the process
- If the old capacity is greater than 0, check whether the capacity exceeds the upper limit after doubling. If not, double the capacity. If the old threshold overflows after doubling, go to 4
- If the old capacity is equal to 0 and the old threshold is greater than 0, the new capacity is equal to the old threshold. Go to the 4
- If both the old capacity and the old threshold are 0, the default values are used for initialization
- Calculate the new threshold using newCap * loadFactor
- Create a new array based on the new capacity
- Map the elements of the old array to the new array, and split the linked list in the original bucket into two parts. The lower part is still in the position J of the original bucket, and the higher part is put in the position oldCap+ J, oldCap is the old capacity
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
/ / the old capacity
int oldCap = (oldTab == null)?0 : oldTab.length;
/ / old threshold
int oldThr = threshold;
int newCap, newThr = 0;
// If the old capacity is greater than 0, the system has been initialized
if (oldCap > 0) {
// If the old capacity exceeds the maximum capacity, set the threshold to the maximum int and the capacity will not be expanded
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
// If two times of the old capacity is still within the maximum capacity and the old capacity is greater than or equal to 16, double the capacity and threshold
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1;
}
else if (oldThr > 0)
// If the constructor with arguments is called the first putVal will enter here oldCap=0
// Make the new capacity equal to the old threshold value and the new threshold value will be calculated later
newCap = oldThr;
else {
// If the HashMap() constructor is called, the branch is entered. OldCap and oldThr are both 0
// Initialize with default values
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
//newThr = 0 can happen in one of two ways
NewThr = oldThr << 1; NewThr overflow becomes 0
// The other code enters the oldThr > 0 branch without counting newThr
if (newThr == 0) {
// Calculate newThr by the formula
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
// Update the threshold
threshold = newThr;
// Create an array based on the expanded capacity
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if(oldTab ! =null) {
// Iterate over the old array
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// If there is only one element in the bucket, map it directly to the location of the new array
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
// If it is a tree node, split the tree and then map it
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
// If there is more than one element in the bucket and it is not a tree, split the corresponding list elements in the bucket into two lists according to the conditions
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
// Suppose oldCap = 2^k
// Then the binary hash value from the lowest to the highest bit of the k+1 is 0 into the branch
// Branch is linked list join operation
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// The binary hash value from the lowest to the highest k+1 bit of 1 enters the branch
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// The position of the low list in the new array is exactly the same as in the original array
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
// The position of the high list in the new array is larger than the old array oldCap
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
Get (Object key) method
Get value based on key
Methods the process
- Locate the table based on the hash value of the key. If the table is not empty, go to 2; otherwise, null is returned
- Determine if the first element in the bucket is desired, if so, return it, otherwise go to 3
- If there is more than one node in the bucket, the tree node uses the logical lookup of a binary balanced tree, with ordinary nodes traversing the linked list
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// Table is not empty and the hash position is not empty to enter the branch
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
// Check whether the first node in the bucket is desired
if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
return first;
// The first node is not the one to be found and there is more than one node in the bucket
if((e = first.next) ! =null) {
// If the tree node calls the tree lookup method, the binary balanced tree lookup method is used
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
// Otherwise iterate over the list
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
} while((e = e.next) ! =null); }}return null;
}
Copy the code
getTreeNode(int h, Object k)
GetTreeNode is a method of the TreeNode class. Here is the simplified TreeNode class
static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
TreeNode<K,V> parent; // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev; // needed to unlink next upon deletion
boolean red;
final TreeNode<K,V> getTreeNode(int h, Object k) {
// If the parent node of getTreeNode is not empty, start at the root of the tree
return((parent ! =null)? root() :this).find(h, k, null);
}
// Balance binary tree lookup
final TreeNode<K,V> find(inth, Object k, Class<? > kc) {
TreeNode<K,V> p = this;
do {
int ph, dir; K pk;
TreeNode<K,V> pl = p.left, pr = p.right, q;
// The root hash is larger than h
if ((ph = p.hash) > h)
p = pl;
// Otherwise, go to the right subtree
else if (ph < h)
p = pr;
// The key is found
else if((pk = p.key) == k || (k ! =null && k.equals(pk)))
return p;
// Find the right subtree if the left subtree is empty
else if (pl == null)
p = pr;
// If the right subtree is empty, check the left subtree
else if (pr == null)
p = pl;
// Compare according to the comparison logic passed in by the user
else if((kc ! =null|| (kc = comparableClassFor(k)) ! =null) && (dir = compareComparables(kc, k, pk)) ! =0)
p = (dir < 0)? pl : pr;// Go to the right subtree
else if((q = pr.find(h, k, kc)) ! =null)
return q;
// go to the left subtree
else
p = pl;
} while(p ! =null);
return null; }}Copy the code
Remove (Object key) method
Delete elements by key
Methods the process
- Search for the bucket based on the hash value of the key
- Returns NULL if the bucket is empty
- If the bucket is not empty and the first element is to be deleted, use node to preserve the node. Otherwise, go to 4
- Look at subsequent elements, and if it’s a tree node, the logic of the tree is called, while regular nodes traverse the list
- If the node to be deleted is found, it is deleted, otherwise null is returned
public V remove(Object key) {
Node<K,V> e;
return (e = removeNode(hash(key), key, null.false.true)) = =null ?
null : e.value;
}
final Node<K,V> removeNode(int hash, Object key, Object value,
boolean matchValue, boolean movable) {
Node<K,V>[] tab; Node<K,V> p; int n, index;
if((tab = table) ! =null && (n = tab.length) > 0 &&
(p = tab[index = (n - 1) & hash]) ! =null) {
Node<K,V> node = null, e; K k; V v;
// The first element in the bucket is the element to be deleted
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
node = p;
// If the first element is not to be deleted, look at subsequent elements
else if((e = p.next) ! =null) {
// If it is a tree, delete it with red-black tree logic
if (p instanceof TreeNode)
node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
else {
// Otherwise iterate over the list
do {
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
node = e;
break;
}
p = e;
} while((e = e.next) ! =null); }}// The point to delete is found
// If matchValue is false, delete immediately
// If matchValue is true then check if the values match
if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
// Delete the tree node
if (node instanceof TreeNode)
((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
// Make the bucket head point to the second element if you want to delete the first element of the bucket
else if (node == p)
tab[index] = node.next;
// Delete nodes from regular lists
else
p.next = node.next;
++modCount;
// The number of elements is reduced by 1
--size;
// The function body is empty LinkedHashMap, and visting emoval is used
afterNodeRemoval(node);
returnnode; }}return null;
}
Copy the code
conclusion
This paper analyzes the most commonly used methods of Put, get and remove of HashMap, and analyzes the code of its association method.
Among them, the insertion and deletion logic of red-black tree is not specifically analyzed in this paper. I plan to write a special article on red black trees later, and then add this part to it.
Refer to the article
Juejin. Cn/post / 684490…
Segmentfault.com/a/119000001…