HashMap principle technical knowledge collation

Storage structure

1. HashMap = array (O(1)) + unidirectional list (O(n))

2. After JDK1.8 HashMap = array (O(1)) + unidirectional list (O(n)) + red-black tree (O(log n))

1. The default array size is 16.

2. Array expansion is exactly a power of 2.

The default load factor is 0.75.

4. When the length of the list exceeds 8, the list is converted to a red-black tree structure. 5. Red-black tree degenerates into a linked list when the number of nodes decreases to 6.

The above several number relations, and why are the above several numbers are analyzed one by one.

Two. Operating principle

1. Put Storage process

(1) Calculate the bucket position, and calculate the hash value based on the hashcode of the key. Index = hash%length

(2) Determine whether the capacity expansion condition is met. If threshold=DEFAULT_INITIAL_CAPACITY * loadFactor (16*0.75=12) is greater than the threshold value, capacity expansion is required. Otherwise, the next step is taken.

3 Check whether the bucket is empty. If yes, insert data into the bucket. If not empty, go to the next step.

(4) Determine whether it is a linked list or a red-black tree, whether the linked list reaches the transformed red-black tree, the current number of linked list nodes <=8, insert the node; If it is a red-black tree insert node, otherwise next step.

⑤ The linked list is transformed into a red-black tree, and nodes are inserted.

⑥ After inserting the node, calculate whether the current size needs to be expanded. If it is larger than the valve value, expand resize.

/** * Implements Map.put and related methods * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

JDK1.8 HashMap get call key method source code.

2. Get Obtain process

(1) Calculate the bucket position, and calculate the hash value based on the hashcode of the key. Index = hash%length

② Whether it is a value, a linked list, or a red-black tree, the for loop determines whether the hash values are equal or not, and returns the corresponding value if they are equal.

/** * Implements Map.get and related methods * * @param hash hash for key * @param key the key * @return the node, or null if none */ final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key ! = null && key.equals(k)))) return first; if ((e = first.next) ! = null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

JDK1.8 HashMap put call key method source code.

3. Data structure and algorithm thinking

1. Why choose arrays and linked lists?

① Array memory continuous block allocation, efficiency reflects faster query. The position used in a HashMap to find an array bucket, modulo the length of the array using the hash value of the element’s key.

② Linked list efficiency reflects the increase and deletion. Linked lists in a HashMap are used to resolve hash conflicts and balance space consumption.

Extension: Why not ArrayList instead of using Node

[] TAB? Because ArrayList expands by 1.5, HashMap expands by 2.
,v>

2. Why is the hash value calculated according to the hashcode of the key?

① Compute the hash value of the key

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

The code means hash = hashcode 16 bits high and 16 bits low, not directly hashcode.

② Calculate the location code of the bucket

index = (n - 1) & hash
Copy the code

Thought:

First, hash%length is used to reduce hash conflicts, and modulus calculation ensures that the result must be within the range of 0-length.

Second, in order to improve the speed of operation, modular operation is inferior to bit operation. Hash %length == (n-1) &hash is only satisfied when n is a power of 2.

Make sure that (n-1) in the formula conforms to the optimal equation, and the rest considers the optimal hash value. The hash value factor is considered to affect the result as far as possible without conflict.

Because the calculation speed is reflected in the bit operation, the condition n is the power of 2, then the conversion of n-1 to binary is preceded by continuous zeros, followed by continuous ones. For example, if n=16, n-1=15, binary 1111 of 15. Hash & 1111 = Just perform the & operation on the last four bits of the hash binary of interest.

Hashcodes of two keys are equal, but not necessarily equals. The equals, hashcode of the two keys must be equal.

3. Why is the loading factor 0.75 and the linked list length greater than 8 turned into a red-black tree?

Thought:

The above problems are not two independent problems but related to each other, aiming to minimize conflicts while improving space utilization and reducing query costs.

The loading factor determines the threshold value of HashMap expansion. If the bucket is 16, then the expansion value of 16* 0.75=12, that is, 12, should be considered for expansion. There are still 4 unused Spaces to sacrifice. If the load factor is 1, the space utilization is high, but the query speed is slow.

Principle:

The tradeoff is based on the poisson distribution, using 0.75 as the loading factor, and the probability that the length of the list for each collision location exceeds 8 is very low, less than 1 in 10 million.

Source code note:

* Because TreeNodes are about twice the size of regular nodes, we * use them only when bins contain enough nodes to warrant use * (see TREEIFY_THRESHOLD). And when they become too small (due to * removal or resizing) they are converted back to plain bins. In * usages with well-distributed user hashCodes, tree bins are * rarely used. Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) With a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, The expected * occurrences of list size k are (exp(-0.5) * POw (0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * More: less than 1 in ten millionCopy the code

Extension:

Why not start with a red-black tree?

Red-black tree is almost a balanced binary tree, and its structure is suitable for evenly distributing nodes, reducing the tree’s depth like the length of a linked list. The main reason is that in terms of insertion efficiency, adding nodes to a red-black tree is likely to require left-rotation, right-rotation and coloring operations, which are not as efficient as the linked list form.

4. Select the key of the HashMap

1) Select immutable objects, such as strings or ints.

2) If you want to use a custom entity class as key:

① Add final modifier to class to ensure that class is not inherited.

② Make sure all member variables are private and final.

③ Do not provide methods to change member variables, including setters.

(4) Initialize all members through the constructor to carry out deep copy.

5. The hashCode calculation in the String class

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
    char val[] = value;
        for (int i = 0; i < value.length; i++) {
              h = 31 * h + val[i];
          }
          hash = h;
      }
      return h;
}
Copy the code

Hashing formula: s[0]31^(n-1) + S [1]31^(n-2) +… + s[n-1]

Four. Horizontal expansion

1. Thread problems occur in HashMap

① Multi-threaded capacity expansion, caused by the problem of dead loop (JDK1.8, dead loop problem has been solved).

② Multithreading put may cause element loss.

③ After putting a non-null element, get is null.

2. Use thread-safe Map

(1) a HashMap is not thread-safe, thread-safe Collections can be used. The synchronizedMap (m) to obtain a thread-safe HashMap.

②CurrentHashMap and HashTable are thread-safe. CurrentHashMap uses the segment lock technique, which acquires the segment lock before modifying the node.

3.Android advocates the use of ArrayMap

An ArrayMap data structure consists of two arrays, one containing hash values and the other containing keys and values.

② Use binary lookup to find index in hash array based on key hash value.

③ Search for the corresponding position in the key-value array according to the index. If the value is not equal, the value is considered to be in conflict, and the value is searched one by one based on the key.

Advantages, memory savings compared to HashMap for small data volumes (less than 1000). Disadvantages: Deletion and insertion are less efficient than HashMap.