This is the 23rd day of my participation in the August More Text Challenge
preface
HashMap: Sets or gets the corresponding value based on the key. The key algorithm is hash.
public class Test {
public static void main(String[] args) {
Map<String, String> map = new HashMap<>();
map.put("1"."yyds");
map.put("2"."tql");
map.put("3"."xswl");
map.get("1");
map.get("2");
map.get("3"); }}Copy the code
Its data structure thinking process:
- Array: just according to
key
做hash
After ahashKey
Directly to a location in the array. - Linked lists: What happens after hashing? So let’s add a linked list. Let’s add the same
hashKey
Chain the values of. Query performanceO(n)
. - Red-black tree: Is linked list performance bad after more data? So switch to a red-black tree, query performance
O(logn)
.
The three basic storage concepts of a collection of hash classes are shown below:
Data structure: array + linked list + red-black tree
The reason why JDK 1.8 adds red-black trees is that if the list is too long, the performance of HashMap will be seriously affected. Red-black trees can be used to quickly add, delete, modify, and query the list, which can effectively solve the problem of slow operation when the list is too long.
Let’s look at the key point of HashMap, JDK1.8.
1) Member variable resolution
// initial capacity, default capacity: 16,1 left 4 bits, 2 ^ 4
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// Maximum capacity: 2 ^ 30
static final int MAXIMUM_CAPACITY = 1 << 30;
// Load factor: 0.75
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// Tree threshold: from list to tree
static final int TREEIFY_THRESHOLD = 8;
// Do not tree the threshold
static final int UNTREEIFY_THRESHOLD = 6;
// Minimum tree capacity
static final int MIN_TREEIFY_CAPACITY = 64;
// Inner class: linked list
static class Node<K.V> implements Map.Entry<K.V> {
final int hash;
final K key;
V value;
Node<K,V> next;
}
Copy the code
Parameters:
- Default load factor: 0.75
- Default capacity: 16
A meeting of 12 people requires: 12/0.75 = 16 chairs
Mod 16 is less likely to collide than Mod 12 and does not waste resources
However, if the number of people > (number of chairs x load factor), the capacity will be expanded. Call resize to double the capacity.
2)Hash
algorithm
The probability of a hash collision depends on how hashCode is computed and how large the space is.
After collision, use zipper method.
hash
Algorithm, source code analysis:
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
// For example, there is a hash value for a key
// Original value: 1111 1111 1111 1111 1111 1010 0111 1100
0000 0000 0000 0000 1111 1111 1111 1111
// Xor: 1111 1111 1111 1111 0000 0101 1000 0011
Copy the code
We move 16 bits to the right and then xor. Why do we do that?
The goal is to minimize hash collisions and get to the same location in the array. Let the lower 16 bits contain: low 16 bits and high 16 bits characteristics.
And then it says, well, why don’t we do full digit comparisons?
- For the most part, the top 16 bits don’t make much difference.
- The comparison operation, probably from the low level, can compare the difference faster.
3) Addressing algorithm
Let’s go straight to put(K key, V value) :
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// Empty, initialized
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// Whether the corresponding key exists
if ((p = tab[i = (n - 1) & hash]) == null)
// Create one if it does not exist
tab[i] = newNode(hash, key, value, null);
else {
/ /... .
}
/ /... .
}
Copy the code
How do you calculate that? A:(n - 1) & hash
.
n
The default 16:hash
:hash(key)
, is tokey
forhash
After the calculation
// For example, hash: 1111 1111 1111 1111 1111 0000 0101 1000 0011 // n-1 = 15 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 // 15&hash: // when converted to base 10, it is 3 0000 0000 0000 0000 0000 0000 0011 // then I = 3, which is the index of the array corresponding to the hash value obtained by the last addressing algorithmCopy the code
To summarize the addressing algorithm:
(n - 1) & hash
: Determines a position in the array.
Modulo operations, whose performance is relatively poor, in order to optimize the array addressing process.
(n - 1) & hash
: The effect is withhash
对n
I’m going to take the exact same thing, but it’s going to be better than the performance of the operationhash
对n
The mold is much higher
Why is that?
This is a mathematical problem: the length of the array will always be 2 to the n, as long as it keeps the array length 2 to the n. Hash modulo n -> (n-1) & hash is the same, the latter is higher performance.
4)hash
Conflict linked list handling
What if hash conflicts? So how do you deal with that?
Take a look at the source code:
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// Empty, initialized
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// Whether the corresponding key exists
if ((p = tab[i = (n - 1) & hash]) == null)
// Create one if it does not exist
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
// Same hash, same key
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
/ /... .
// Overwrite the old values
// The same key is overwriting the value
if(e ! =null) { // existing mapping for key
/ / oldValue old values
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
// value specifies a new value
e.value = value;
afterNodeAccess(e);
returnoldValue; }}/ /... .
}
Copy the code
Here’s an example:
// There are two operations:
map.put("1"."yyds");
map.put("2"."xswl");
// then "yyds" is oldValue
// then "XSWL" is value
V oldValue = e.value;
// The new value overwrites the old value
Copy the code
If you have a long list hanging somewhere after a lot of hash collisions, it can be particularly painful to traverse.
So, JDK 1.8 optimized this to convert lists to red-black trees at 8: O(logn) time.
The corresponding source code is as follows:
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
/ /... .
else {
/ /... .
// Now it is a red black tree
else if (p instanceof TreeNode)
// A red-black tree is a balanced binary search tree
// The operation is complicated when inserting nodes
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
// 如果 binCount >= 15
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
// Convert to a red-black tree
treeifyBin(tab, hash);
break;
}
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}/ /... .
}
/ /... .
}
Copy the code
treeifyBin(tab, hash);
The source code is as follows:
- A single list is converted to a bidirectional list
- Bidirectional lists are transformed into red-black trees
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) ! =null) {
TreeNode<K,V> hd = null, tl = null;
do {
TreeNode<K,V> p = replacementTreeNode(e, null);
if (tl == null)
hd = p;
else {
p.prev = tl;
tl.next = p;
}
tl = p;
} while((e = e.next) ! =null);
// after executing the do-while, we first convert the one-way list to a two-way list of type TreeNode
// Then it becomes a red black tree
if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code
Summary: First, the linked list. If the length of the list exceeds 8, the list is converted into a red-black tree
5) Principle of capacity expansion based on array
The underlying data structure of HashMap is: data + linked list + red-black tree.
HashMap
Capacity expansion: double capacity expansion +rehash
After capacity expansion, the previously hashed keys need to be rehashed. Each key-value pair is readdressed to find the location of the new array based on the hash value of the key
Expansion is in accordance with the multiple of 2, the source code is as follows:
Static final int tableSizeFor(int cap) {// Indicates that the target value is greater than or equal to the original value. For example, if the value is binary 1000 and decimal 8, the value is 0. // If you do not subtract 1 from it, you will get the answer 10000, which is 16. // Clearly not the result. If you subtract 1 from the binary to 111, you get the original value of 1000, or 8. int n = cap - 1; // Suppose n is 01xxx... XXX // Move n 1 bit to the right: 001xx... XXX, or: 011XX... XXX // Move n 2 to the right: 00011... XXX, or: 01111... XXX / / at this time already has four 1, in front of the four then moves to the right and position or available eight 1 n | = n > > > 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }Copy the code
Previously conflictedkey
After expansion, the new array may be distributed in different locations:
6)JDK 1.8
A high performancerehash
algorithm
After JDK 1.8, to improve the performance of the rehash process, it is not simply a matter of modulating the length of a new array with the hash value of the key.
Instead of using bitwise operations, for example:
Case diagram:
- When array length is 16:
Array length n = 16, Hash1 = hash(key1) n-1 0000 0000 0000 0000 0000 0000 0000 1111 hash1 1111 1111 1111 0000 1111 0000 0101 # Hash1 after ampersand (&) : & result 0000 0000 0000 0000 0000 0000 0000 0000 0101 = 5 Hash2 = hash(key2) n-1 0000 0000 0000 0000 0000 0000 1111 hash2 1111 1111 1111 1111 0000 1111 0001 0101 & result 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000Copy the code
You can see that the two hashes in case 1 have a hash collision, which can be resolved using a linked list or red-black tree.
- When the array length is 32 after expansion:
Array length n = 32, Hash1 = hash(key1) n-1 0000 0000 0000 0000 0000 0001 1111 hash1 1111 1111 1111 0000 1111 0000 0101 & result 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 Hash2 = hash(key2) n-1 0000 0000 0000 0000 0000 0001 1111 hash2 1111 1111 1111 1111 0000 1111 0001 0101 & result 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000Copy the code
In case 2, we found that hash2 changed from index = 5 to index = 21. After each expansion, the hash value changes as follows:
- Or stay in the same position (
index
The same) - either
New index = old index + old array length oldCap
For example, a chestnut:index(21) = index(5) + oldCap(16)
Summary capacity expansion mechanism:
-
Array multiplied by 2
-
Readdress (
rehash
) :hash & n - 1
To determine whether there is an extra 1 bit in the binary result- If it’s not much, it’s the original
index
- If it’s too much, then it’s
index + oldCap
In this way, it avoids the rehash process where each hash is modulo to the new array.length. The modulo performance is not high, but the bit performance is high.
- If it’s not much, it’s the original