Knowledge points involved

Hashmap is a Hashmap that is used to communicate with students. It is a Hashmap that is used to communicate with students. It is a Hashmap that is used to communicate with students.

  • First of all, from the perspective of production, HashMap is one of the most commonly used collections in the production process. If we do not fully understand its principle, it is difficult to give full play to its advantages and even cause online bugs
  • From the perspective of data structure and algorithm, HashMap involves array, linked list, red-black tree, and the conversion process of the three, as well as bit operation and other rich data structure and algorithm content.

As a result, HashMap has become a frequent topic for interviews. These processes are clear and show that the foundation of data structures and algorithms is not too bad.

HashMap basics

Data structure is array + linked list + red black tree

Among them, when the number of array elements exceeds 64 and the number of linked list elements stored in a single position of the array exceeds 8, red-black tree will be evolved. Red-black tree was introduced to speed up the search efficiency

Similarly, an array degrades to a linked list when the number of linked list elements stored in a single location is less than 6

Important class attributes of the HashMap

In fact, the JDK source code has very detailed comments, but still translate it, as follows:

  • The initial capacity is 16
/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
Copy the code
  • Maximum capacity is 2^30
/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;
Copy the code
  • The default load factor is 0.75

The load factor is selected based on the result of time and space balance, which means neither wasting too much space nor frequent expansion

/** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.6f;Copy the code
  • The threshold for the number of list elements to evolve into a red-black tree is 8. If the number of array elements is greater than 64, the list elements that exceed 8 will evolve into a red-black tree
/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;
Copy the code

As for why 8 was chosen, there is a long note that explains why 8 was chosen.

* Because TreeNodes are about twice the size of regular nodes, we * use them only when bins contain enough nodes to warrant use * (see TREEIFY_THRESHOLD). And when they become too small (due to * removal or resizing) they are converted back to plain bins. In * usages with well-distributed user hashCodes, tree bins are * rarely used. Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) With a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, The expected * occurrences of list size k are (exp(-0.5) * POw (0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * More: less than 1 in ten millionCopy the code

If hashCode is well distributed, red-black trees are rarely used because the values are evenly distributed and the list is rarely long. In the ideal case, the length of the list conforms to poisson distribution, and the hit probability of each length decreases in turn. The notes show us the specific hit probability of 1-8 length. When the length is 8, the probability is only 0.00000006, such a small probability that red-black tree conversion of HashMap will hardly occur

  • Red-black trees degenerate into linked lists with fewer than six elements

Why is it six and not seven because you don’t have to go from tree to list, if it’s seven, if you add one element you have to evolve into a red-black tree, and if you delete one element you have to degenerate into a linked list

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;
Copy the code
  • The minimum array length that a list tree can evolve into a tree. If the array length reaches this value, the list can evolve into a tree
/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

Important inner classes

The linked list Node < K, V >

Definition of the linked list Node class, is next familiar with, ** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) */ static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; }}Copy the code

Red and black tree TreeNode > < K, V

/** * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn * extends Node) so can be used as extension of either regular or * linked node. */ static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // red-black tree links TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; // needed to unlink next upon deletion boolean red; TreeNode(int hash, K key, V val, Node<K,V> next) { super(hash, key, val, next); } /** * Returns root of tree containing this node. */ final TreeNode<K,V> root() { for (TreeNode<K,V> r = this, p;;) { if ((p = r.parent) == null) return r; r = p; }}Copy the code

Put method of HashMap

You’re actually calling the putVal method

/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for  the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); }Copy the code

So let’s look at the putVal method

/** * Implements Map.put and related methods * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

PutVal’s process is as follows:

Resize for HashMap (important)

Initialize or double the original capacity. If it is empty, the original capacity is allocated. Otherwise, double the capacity, leaving the existing elements in their original positions or moving the size of the original array back in the new array

/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[]  oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; if (oldCap >= MAXIMUM_CAPACITY) {oldCap = integer.max_value; return oldTab; } // It does not exceed the maximum value, Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold} // Else if (oldThr > 0) // Initial capacity was placed in threshold newCap = oldThr; // Zero initial threshold using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // Calculate the new resize upper limit if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; // New array @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

The capacity expansion process is as follows:

The process of rearranging linked lists

It is possible to split the original list into two lists and determine whether the current list element needs to be moved to the new list by calculating (e.hash & oldCap) == 0? False: Elements need to be moved to a new list. Elements need to be moved to a new list. Elements need to be moved to a new list. Take the expansion of the array from 16 to 32 as an example, after the expansion of the linked list with subscript 1, as shown in the figure: part of the list is still kept at subscript 1, and the other part is moved to the location with subscript 1+16.

When a list is reorganized, the relative order of the elements in the same list does not change

The rearrangement of red-black trees

TreeNdoe is also a subclass of Node. The rearrangement process of a red-black tree is similar to that of a linked list, because it also maintains a linked list structure. Splitting a red-black tree into two linked lists, or one red-black tree into one linked list, To determine whether the current list element needs to be moved to a new list, calculate (e.hash & oldCap) == 0? True, don’t need to move, still in the original position, we use the head node for loHead skewer to these elements in the list, if the number of elements is not less than 6, continued to maintain a red-black tree, or for the chain table false, elements need to be moved to another new location, we use the head node for hiHead will list these elements together, If the number of elements is not less than 6, continue to maintain as a red-black tree, otherwise as a linked list

HashMap thread security

Thread safety issues in HashMap:

1. When a HashMap reads the Hash slot header, it reads the object referenced in the working memory. In concurrent cases, the value modified by other threads cannot be read in time.

2. When a HashMap inserts a new element, it makes two judgments:

2.1 For the first time, the hash slot is used based on the hash of the key. If the hash slot is not used, the inserted object is inserted. In the concurrent case, if thread A determines that the slot is not occupied, the time slice runs out when it performs write operations. Thread B also retrieves the hash(it happens to hash an object from thread A) to determine whether the slot is occupied and inserts the object directly. Thread A is then rescheduled by the CPU to continue writing, overwriting thread B’s data. (Note: Visibility issues here, too)

2.2 The second time is in the same hash slot. Since HashMap is designed to keep the key value unique, it determines whether the current key to be inserted exists and overwrites it if it does. Similar to the above, in the case of concurrency, if thread A determines that the last node still does not find the duplicate key, the node constructed with the current object will be hung in the linked list or red-black tree. If thread B conducts judgment and write operations between A’s judgment operation and write operation, data overwriting will also occur.

Similar concurrency problems occur with capacity expansion. There is also the problem of size. I feel that the accuracy of size can compromise the performance under high concurrency.

Refer to the article

【 1 】 tech.meituan.com/2016/06/24/… 【 2 】 blog.csdn.net/weixin_3143…

This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.