Abstract
HashMap is a data structure that is frequently used in daily development. With the update of Jdk, the underlying level of HashMap has changed a lot. For Jdk1.8, HashMap is optimized by adding red-black tree on the basis of array plus linked list. Before analyzing HashMap of Jdk1.8, this paper analyzes HashMap of JDK1.7.
A HashMap uml class diagram
Realize the principle of
HashMap uses an array as a hash bucket and a linked list to record the conflicting elements, that is to say, the chain address method is used to resolve hashCode conflicts.
Member variables
// The default capacity of hashMp, 1<<4=2^4=16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// The maximum size of a hashMap is 2^30, which is a little less than integer.max_value
static final int MAXIMUM_CAPACITY = 1 << 30;
// Default load factor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// Empty hash bucket
static finalEntry<? ,? >[] EMPTY_TABLE = {};// Hash bucket for storing elements, which by default points to an empty hash bucket
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
// Record the number of elements stored in the hashMap
transient int size;
// Threshold, which can be used to identify the threshold for next capacity expansion
int threshold;
// Load factor
final float loadFactor;
// The number of changes to the hashMap structure
transient int modCount;
// Hash seed, which can scramble the hash of the key to make the hash value more random
transient int hashSeed = 0;
Copy the code
Since HashMap uses array + linked list to implement hash table, the following problems need to be solved for the implementation of hash table:
- How to design a good hash function
- How to resolve hash conflicts
- How to determine the size of a hash table
- What is a load factor and how is it determined
So, let’s go through the problems one by one.
- What is a load factor and how is it determined?
Loading factor = threshold/hash table length. Loading factor can be used as a measure of the space and time of HashMap. According to the above source code, it can be seen that the default capacity of HashMap is 16.
- When the loading factor is larger and the length of the hash table is fixed, it indicates that the expansion threshold is higher and the space utilization of the HashMap is higher, but the probability of HashMap conflicts is increased. As a result, the linked list stores more conflicting elements and the time utilization is reduced.
- If the loading factor is smaller and the length of the hash table is fixed, the capacity expansion threshold is low and the HashMap space utilization is low, which leads to frequent capacity expansion and greatly affects the performance.
The choice of 0.75 for HashMap is a compromise between improving space utilization and reducing query costs, so we generally do not recommend custom load factors without special requirements.
- How to determine the size of a hash table?
The larger the hash table is, the more space it uses, and the fewer elements it stores, the more space it wastes. The smaller the hash table is, the smaller space will be used, resulting in a high probability of hash conflicts, affecting performance to some extent.
- How to design a good hash function?
A good hash function has two characteristics: low collision concept and uniform dispersion. So the hashSeed in the above source code is mainly used to scramble the hash value of the key, so that the calculated hash value is more random, so that the hash collision probability is low, and the dispersion is relatively uniform.
- How to resolve hash conflicts?
HashMap uses the hash address method to resolve hash conflicts, HashMap uses the Entry structure to achieve the list, source code is as follows:
static class Entry<K.V> implements Map.Entry<K.V> {
/ / store the key
final K key;
/ / store the value
V value;
// point to the next node
Entry<K,V> next;
/ / the hash value of the key
int hash;
}
Copy the code
From the above source code, you can see that the linked list is maintained through next.
Function implementation
The constructor
The constructor for the hashMap core is as follows:
/ * * *@paramInitialCapacity initialCapacity *@paramLoadFactor loadFactor */
public HashMap(int initialCapacity, float loadFactor) {
//1. Initial capacity
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
threshold = initialCapacity;
init();
}
Copy the code
The constructor processing logic is as follows:
- The boundary check of the initial capacity value specifies that the initial capacity value must be between [0,2 ^30].
- Load factor boundary validation, which specifies that the load factor needs to be greater than 0 and cannot be an invalid number.
- Set the load factor.
- Take the initial capacity as the threshold.
Create a hash bucket
The source code is as follows:
/** * create hash bucket */
private void inflateTable(int toSize) {
// Calculate the nearest 2^n toSize as the size of the array
int capacity = roundUpToPowerOf2(toSize);
// Calculate the threshold based on the capacity * load factor
threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
// Create a hash bucket
table = new Entry[capacity];
// Initialize the hashSeed
initHashSeedAsNeeded(capacity);
}
Copy the code
The above source code ensures that the capacity of the HashMap is a power of two.
Hold a null value
The source code is as follows:
private V putForNullKey(V value) {
// Get the element with index 0, e as the first node
for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {
if (e.key == null) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(0.null, value, 0);
return null;
}
Copy the code
From the above source code, it can be seen that the key with null is mainly stored in the hash bucket index 0, and then traversed the linked list, if found in the linked list there is a value with null key will be replaced, and return the old value.
If the above traversal does not find a value with a null key, it will be stored in the hash bucket with an index of 0.
Add elements
The source code is as follows:
void addEntry(int hash, K key, V value, int bucketIndex) {
if ((size >= threshold) && (null! = table[bucketIndex])) { resize(2 * table.length);
hash = (null! = key) ? hash(key) :0;
bucketIndex = indexFor(hash, table.length);
}
createEntry(hash, key, value, bucketIndex);
}
Copy the code
The main function is to create a new Entry and add it to the hash bucket. Of course, before adding elements, you need to check whether the number of elements stored in the current hash bucket has reached the capacity expansion threshold, and the hash bucket corresponding to the current index stores the element. In this case, capacity expansion and rehash will be triggered (the following will be analyzed for capacity expansion).
CreateEntry by calling createEntry. CreateEntry by calling createEntry.
void createEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
size++;
}
Copy the code
Basically, you create an Entry and store it in the table.
Expansion processing
The source code is as follows:
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable, initHashSeedAsNeeded(newCapacity));
table = newTable;
threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {
while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code
From the addEntry method, we can see this line of code:
resize(2 * table.length);
Copy the code
This line of code calls the resize method by multiplying the current number of hash buckets by 2.
Then the processing principle of resize method is also relatively simple, and the process is as follows:
- Use the oldTable variable to cache the hash bucket before expansion.
- Check whether the current hash bucket capacity has exceeded 2^30. If so, set the threshold to the maximum value of Integer and return. If not, proceed to the next step.
- Create a new hash bucket based on the new capacity, with no data in the hash bucket.
- The transfer method is called to transfer the pre-expansion hash bucket to the new hash bucket.
- Recalculate the threshold for next capacity expansion.
The verification of the second step is directly returned because capacity expansion is not allowed because the capacity has reached the maximum value. Therefore, set the threshold to the maximum value to ensure that capacity expansion operations are not triggered next time.
Transfer method is mainly used for element migration, and the process is as follows:
- Iterate through the hash bucket before capacity expansion to obtain the current traversal node.
- Iterate over the collision chain of the current node, and calculate the index corresponding to the current traversal node on the new hash bucket according to the hash of the current traversal node E and the capacity of the new hash bucket.
- Stores the current traversal node in a new hash bucket.
IndexFor method was called in the above transfer to obtain the corresponding index of the node in the hash bucket. The source code is as follows:
static int indexFor(int h, int length) {
// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
return h & (length-1);
}
Copy the code
The above source code, the use of the mod method to calculate the index, that is, H % length, but due to the low performance of % operation, so the use of and operation to optimize, that is, H % length= H & (length-1), the premise of this algorithm is, length=2^n.
Calculate the hash
The source code is as follows:
final int hash(Object k) {
int h = hashSeed;
if (0! = h && kinstanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h ^= k.hashCode();
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code
The hash function is processed as follows:
- Gets the hashSeed, which has been hash scrambled.
- When the hashSeed is not zero and the key is of type String, the sun stringHash32 method is used to obtain the hash value of the String.
- When the key is of other types, the hashSeed is used to perform xor processing with the hashCode of the key, and finally the hash value of the computed processing is disturbed several times to obtain the hash value of the key.
Hash bucket to add elements
The source code is as follows:
public V put(K key, V value) {
if (table == EMPTY_TABLE) {
inflateTable(threshold);
}
if (key == null)
return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
Copy the code
Before introducing the PUT method, I’ve looked at all the other methods that are called inside the put method, so it should be fairly straightforward to look at now. It works like this:
- Determine whether the current hash bucket is empty, if so, then create a hash bucket (refer to create a hash bucket source analysis above).
- Check whether the key is null. If so, store the NULL key (refer to the above source code analysis for storing null values).
- Compute the hash value of the key (refer to the source code analysis of compute hash above).
- The index is calculated based on the hash and the current capacity of the hash bucket (refer to the source analysis for indexFor at the end of the expansion process above).
- According to the index obtained in step 4, get the node corresponding to the hash bucket, traverse the list, if the list already exists the same element, then replace, and return the old value.
- This step is to call addEntry to add elements to the hash bucket (header method) if step 5 fails.
- Update the number of hash bucket elements and record the number of hash bucket structure changes.
Thread safety
Please refer to the Meituan Technical team – Rediscovering HashMap
conclusion
After reading the source code of HashMap, the following conclusions are summarized:
- The HashMap uses the hashSeed to perturb the hashCode of the key through multiple shifts and xor perturbations to make the hash more random.
- HashMap uses the and operation to optimize the % operation, and in order to determine the equality of the algorithm, the hash bucket is created with a power of 2.
- The HashMap is not expanded when the hash bucket is full, but when the hash bucket reaches three-quarters.
- The new elements in the HashMap are head-inserted, and a linked list is used to record the conflicting elements.
- Null values are hash buckets with index 0.
- An infinite loop bug can occur when concurrent expansion occurs under a multi-threaded shared hashMap.
- Each expansion of the HashMap doubles the size of the original array.
reference
Meituan Technical Team – Rediscover HashMap