The data structure

The important parameters

  • ** Capacity: The length of the array in the HashMap
    • Capacity range: must be a power of 2 & < maximum capacity (2 to the 30th power)
    • Initial capacity = the capacity of the hash table when it was created
    • Default capacity =16 =1 <<4 = 00001 1 moved 4 bits to the left = 10000 = decimal 2^4=16 static final int DEFAULT_INITIAL_CAPACITY =1 <<4;
    • Static final int maximum capacity = 1 << 30; static final int MAXIMUM_CAPACITY = 1 << 30;
  • The load factorLoad factor: A measure of how full a HashMap can be before its capacity automatically increases
    • Default loading factor = 0.75 static final float DEFAULT_LOAD_FACTOR = 0.75f;
  • Threshold: When the size of the hash table is greater than or equal to the threshold, the hash table will be expanded (that is, expand the capacity of the HashMap).

Source code analysis

The constructor

Put method parsing

  1. If the hash table is not initialized (that is, the table is empty), the array table is initialized using the threshold set at constructor time (that is, the initial capacity)
  2. Check whether key is null.
    • If key == null, insert the hash value into table[0].
    • If key is not null, calculate the position in array table (subscript, index)
    • Determine whether the value of the key already exists (by traversing the list of elements that start with the array)
      • If the key already exists (key-value already exists), the new value replaces the old value
      • If the key does not exist, add key-value to the table
  • The key of a HashMap can be null.
  • The key of a HashMap can be null and only one key, but the value can be null and multiple values

Initialize the hash table

Convert the passed capacity size to: > The minimum power of 2 of the passed capacity size, the capacity exceeds the maximum, initialize the capacity set to the maximum; Otherwise, set to: > The smallest power of 2 of the passed capacity size

How do I calculate an index in a hashTable

  • Calculation of hashCode
  • Secondary perturbations: bit operations and xor operations handle hashCode => h
  • index = h & (length-1)

Why not just use hashCode () as the subscript to store the array table?

Conclusion: It is easy to mismatch the hash code and the array size range, that is, the calculated hash code may not be in the array size range, resulting in the failure to match the storage location

Why do we need to do a second hash before we can evaluate the array subscript: perturbation?

Increase the randomness of the Hash code low position to make the distribution more uniform, so as to improve the randomness and uniformity of the index position of the corresponding array storage, and ultimately reduce Hash conflicts.

Why is the length of an array required to be 2 to the NTH power

The result of reducing the array length by 1 is binary 011111… If the first digit of the form is 0 and the second digit is 1, the parity of the result index = h&(n-1) is determined by the parity of h itself. If the last digit is 0, the memory bit must be even, which increases the probability of conflict

The add/replace process for key-value pairs

  • Add uses the array header method, where the newly added node is always placed at the head of the array
  • If the key is equal, just find the node corresponding to the key and overwrite the value

Expansion mechanism

When using the PUT method, you check that the actual size of the array is greater than the threshold, save the old array, create an array with twice the current capacity, iterate over each data in the old array, re-calculate based on the hash function, transfer each data in the old array to the new array one by one, recalculate the threshold of the new array, and the expansion is complete.

Difference between 1.7 and 1.8

How is the hash function implemented in HashMap? What other hash functions can be implemented?

A: The hashCode of the key value of the hashMap is hash based on the length of the array. The operation is to calculate the stored index with or without sign shift, xor by bit, and by bit and. Hash functions also have middle squares, pseudorandom, and remainder methods. All three are relatively inefficient.

What happens when the Hashcodes of two objects are equal?

A: Hash collisions are generated. If the length of the linked list exceeds the threshold of 8 and the array exceeds the threshold, expansion will be performed first. If the array length exceeds 64 and the linked list length exceeds 8, storage will be converted to red-black tree.

Expansion mechanism of HashMap

The default load factor is 16 x 0.75 = 12. The default load factor is 16 x 0.75 = 12. The capacity expansion consists of two steps. Hash all the data in the original array into the new array. Note: If the array Length changes, the hash value will change.

Why is the load factor 0.75

Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // Aka 16 To reduce the probability of conflicts, when the hashMap array length reaches a critical value, expansion will be triggered, and all elements will be rehash into the expanded container, which is a very time-consuming operation.

The threshold is determined by the load factor and the capacity of the current container: DEFAULT_INITIAL_CAPACITY*DEFAULT_LOAD_FACTOR, which is 16×0.75=12 by default, triggers the expansion operation. Ideally, using random hash codes, the frequency of node occurrence follows poisson distribution in the hash bucket, and the comparison table of the number and probability of elements in the bucket is given. As you can see from the table above, by the time the number of elements in the bucket reaches 8, the probability becomes very small, meaning that with 0.75 as the loading factor, it is almost impossible to have more than 8 linked lists per collision location.

Why does the underlying array length of a hashMap have to be 2 to the NTH power

This is related to the formula of the hash algorithm. In order to ensure that the results of the hash algorithm are evenly distributed, if it is 2 to the n power, the binary of Length-1 is all 1, and the result of index is equivalent to the last few bits of hashcode. As long as you make sure that the hashCode is evenly distributed then the index will be evenly distributed.

Why do we need to override hashCode when we override equals

If overridden, equal (object) is the memory address of the two objects to be compared. Equals (hashMap) finds the specific key in the list and overwrites hashCode to ensure that different objects return different hash values. The same object returns the same hash value. Before overriding equals, we inherited the equals method of object, which compares the memory addresses of two objects. Obviously, we new the memory addresses of two objects. == compares the values of two objects. For reference objects, == compares the addresses of two objects. Remember that a HashMap looks for an index using the hashCode of a key

How to resolve HashMap thread insecurity

Thread-safe methods include HashTable, ConcurrentHashMap, and SynchronizedMap. The best method is ConcurrentHashMap, which is generally used. The Hashtable container uses synchronized to ensure thread-safety, but Hashtable is very inefficient in the context of competitive threads. Because when one thread accesses the synchronization method of Hashtable, other threads may enter a blocking or polling state while accessing the synchronization method of Hashtable. For example, thread 1 uses PUT to add elements, thread 2 cannot use put to add elements, nor can it use GET to get elements. Therefore, the more fierce the competition, the lower the efficiency.

ConcurrentHashMap Indicates the concurrency mechanism

Each lock is used to lock part of the data in the array, so that it can solve the problem that multi-threading access to different data segments in the container does not need to compete for locks, effectively improving the concurrency efficiency. CAS+synchronized is used in 1.8 to ensure thread safety, and the data structure is as follows: Array + linked list + red-black tree. CAS(atomic operation to compare and swap) V O N Memory value Old value New value Takes the value in the memory address and compares it with the old value. If the value is the same, no threads are occupied. Write N (new value) into it. 】

What is the principle of red-black trees?

A red-black tree is a balanced binary tree with red-black nodes. Each node is either red or black, the root node and the leaf node are black, and the two children of each red node must be black. It is necessary to turn the list into a red-black tree by left-rotation, right-rotation and coloring, which is mainly used to improve the efficiency of the search in the list at the expense of the efficiency of insertion and deletion. Each insertion and deletion needs to be adjusted again to restore the red-black tree. Distinguished from balanced binary trees, the longest possible path from root to leaf is no more than twice as long as the shortest possible path ———————————————— This article is originally published BY CSDN blogger “doubleStrongWu” under CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement. Original link: blog.csdn.net/qq_37126175…