preface

Hello, everyone. May Day is very happy, start to go to work is there a feeling of a kind of reeling, still immersed in a holiday. Today I’m going to introduce you to Hashmap, one of the most frequently used And favorite Java data structures that interviewers ask about on a daily basis. I don’t know if you have known about hashMap ** before, or read the source code, you can have a look at this article, I hope to give you help. If there is a mistake in the text, please point out ~

A hashmap is introduced

Before we learn a knowledge, we should first know what the knowledge is useful to our work and life. This article will systematically introduce the principles of HashMap.

So what is a hashmap?

A Hashmap is a Java data structure for key-value pairs that stores data in key-value form. The corresponding value can be found through the key value, similar to the dictionary, the corresponding Chinese characters can be found through the pinyin.

Ii. Introduction of key concepts

Hashmap has a lot of concepts, but for those who are new to Java or don’t know much about data structures, here is a blog post introducing the concepts

1. Array: Using a segment of continuous storage units to store data. For the search of the specified subscript, the time complexity is O(1); To search through the given value, we need to go through the number group and compare the given keyword and array elements one by one, and the time complexity is O(n). Of course, for the ordered array, binary search, interpolation search, Fibonacci search and other methods can be used to increase the search complexity to O(logn). For normal insert and delete operations, which involve moving elements in an array, the average complexity is also O(n).


2. Linear linked list: For the operation of adding and deleting the linked list (after the specified operation position is found), only the reference between nodes is processed, and the time complexity is O(1). While the search operation needs to traverse the linked list one by one for comparison, and the complexity is O(n).


3. Binary tree: Insert, search, delete and other operations on a relatively balanced ordered binary tree with an average complexity of O(logn).


4. Hash table: Compared to the above several kinds of data structure, in the hash table to add, delete, search operations, such as performance is extremely high, without considering the hash conflict will explore the hash conflict (behind), positioning can be done only once and the time complexity is O (1), then we’ll look at how a hash table implementation to achieve amazing constant order O (1).


As we know, there are only two physical storage structures for data structures: Sequential storage structure and chain store structure (like a stack, queue, tree, graph to abstraction from the logical structure, such as map into memory, also the two physical form), on which we have mentioned, according to the index to find an element in the array, a positioning can be achieved, hash table using this feature, the backbone of the hash table is an array.


For example, if we want to add or find an element, we can do this by mapping the key of the current element to a location in the array through a function, and we can do this by locating the array index once. This function can be simply described as: store location = f(keyword), this function f is generally called the hash function, the design of this function will directly affect the hash table. For example, let’s say we want to perform an insert operation in a hash table: The insert process is shown below

In the same way, the actual address is calculated using the hash function, and then the corresponding address is fetched from the array.

5. Hash conflicts

But nothing is perfect. What if two different elements hash to the same location? That is, when we hash an element, we get a storage address, and then we insert it, and it’s already occupied by another element, that’s what’s called a hash collision, or a hash collision. We mentioned above, the design of the hash function is crucial to a good hash function will be as much as possible to ensure calculation simple and uniform hash address distribution, however, we need to clear is that an array is a fixed length of continuous memory space, it is a good hash function also can’t guarantee a memory address is absolutely no conflict. So how are hash conflicts resolved? Hash conflicts can be resolved in a variety of ways: open addressing (when a conflict occurs, you continue to search for the next available chunk of storage), re-hashing, and chaining. HashMap uses chaining (array + list).

Source: blog.csdn.net/woshimaxiao…

Now that we know the key underlying concepts of HashMap, let’s dig into what haspMap really is.

Key member variables

3.1. Description of member variables

Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; When the number of elements ina HashMap reaches 75% of the current capacity, the expansion is triggered. Static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; // Static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; // The second requirement for converting the list to red and black corresponds to the TREEIFY_THRESHOLD, the minimum array size for converting the list to red and black. static final int MIN_TREEIFY_CAPACITY = 64; // The number of existing key-values in the hashMap, which is different from the capacity. transient int size; // Record the number of changes made to the hashMap node. This field is used to ensure that the data in the HashMap cannot be changed structurelly. for example, deleting one of the key-values. Can lead to fast - fail, throw ConcurrentModificationException 】 【 in the right way is to use iterators iterate through deletion. transient int modCount; // specifies the capacity of the next container. If the input capacity is 11, the initial size of the container is 16. See HashMap#tableSizeFor(int cap) method int threshold; // The capacity expansion factor of the current HashMap. The default value is DEFAULT_LOAD_FACTOR final float loadFactor;Copy the code

3.1.1. Description of modCount variable

ModCount is used to record the number of times a HashMap has been modified during a hashMap traversal. Let’s see what modCount does when it traverses a HashMap

Error model

public static void main(String[] args) { HashMap<Object, Object> map = new HashMap<>(15); for (int i = 0; i < 100; i++) { map.put(i,i); } for (Map.Entry child:map.entrySet()){ if(Objects.equals(child.getKey(),5)){ map.remove(child.getKey()); } } System.out.println(map.size()); } the console output Exception in the thread "is the main" Java. Util. ConcurrentModificationException ats java.util.HashMap$HashIterator.nextNode(HashMap.java:1445) at java.util.HashMap$EntryIterator.next(HashMap.java:1479) at  java.util.HashMap$EntryIterator.next(HashMap.java:1477) at com.examp.util.HashmapDemo.main(HashmapDemo.java:19)Copy the code

Right way

public static void main(String[] args) { HashMap<Object, Object> map = new HashMap<>(15); for (int i = 0; i < 3; i++) { map.put(i,i); } Iterator<Map.Entry<Object, Object>> iterator = map.entrySet().iterator(); while (iterator.hasNext()){ Map.Entry<Object, Object> entry = iterator.next(); If (entry.getValue().equals(1)){// Delete iterator.remove(); // delete iterator.remove(); } // If the iterator is not deleted immediately, the iterator will find that the element has been deleted. // If the iterator is deleted immediately, the iterator will find that the element has been deleted. System.out.println(map.toString()); system.out.println (map.tostring ()); } System.out.println(map.size()); } Console output {0=0, 1=1, 2=2} {0=0, 2=2} {0=0, 2=2} 2Copy the code

parsing

Click the Remove method on the console output from the first error demonstration

final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; if ((tab = table) ! = null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) ! = null) { Node<K,V> node = null, e; K k; V v; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) node = p; else if ((e = p.next) ! = null) { if (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreeNode(hash, key); else { do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) ! = null); } } if (node ! = null && (! matchValue || (v = node.value) == value || (value ! = null && value.equals(v)))) { if (node instanceof TreeNode) ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); else if (node == p) tab[index] = node.next; else p.next = node.next; // Modified modCount ++modCount; --size; afterNodeRemoval(node); return node; } } return null; }Copy the code

We noticed that after the remove method was executed, we changed the modCount value, and then during the for loop, as we went through the next element

final Node<K,V> nextNode() { Node<K,V>[] t; Node<K,V> e = next; If (modCount! = expectedModCount) throw new ConcurrentModificationException(); if (e == null) throw new NoSuchElementException(); if ((next = (current = e).next) == null && (t = table) ! = null) { do {} while (index < t.length && (next = t[index++]) == null); } return e; }Copy the code

During element traversal, the structure of the HashMap changed, causing an error.

When you use the iterator method for remove

public final void remove() { Node<K,V> p = current; if (p == null) throw new IllegalStateException(); if (modCount ! = expectedModCount) throw new ConcurrentModificationException(); current = null; K key = p.key; removeNode(hash(key), key, null, false, false); expectedModCount = modCount; }Copy the code

When the iterator operates on the HashMap, the expectedModCount is reassigned with no error


So why do people who develop HashMaps use this mechanism?

Back in the iterator code, the iterator reassigns the expectedModCount because the HashMap is thread unsafe and there is no guarantee that other threads will modify the map data structure. When the iterator is reassigned with the expectedModCount value, other threads will immediately detect that the data has been modified and quickly fail.

Ps: This mechanism I think there is nothing ruan use, is multi-threaded concurrent I directly give you error, but I want the data can be assigned in order, and I may not change a key, you give me a direct failure, too dehumanized. So in the case of concurrent multithreading in order to be able to update the data correctly, it is recommended to use currentHashMap, this subsequent time to introduce to you

Iv. Key method analysis

The key methods in HashMap are put and get methods, which are used in more than 95% of our daily work and learning. The following sections will begin with these two methods to explore the principles of HashMap.

4.1. The put method

The old rules on the source first

Public V put(K key, V value) {return putVal(hash(key), key, value, false, true); }Copy the code
Static final int hash(Object key) {int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }Copy the code
/** ** @param hash key hash value * @param key key value * @param value value * @param onlyIfAbsent Value * @param evict False indicates that the current HashMap is in initial build mode (used to build LinkedHashMap, * @return previous value, or null if none */ final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; / / if the hashmap is empty the new, initialize the capacity of the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; / / if the hash values on the array nodes equal to an array of values to replace the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If the array node is a red-black tree node, Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeval (this, TAB, hash, key, value); For (int binCount = 0; ; If ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); If (binCount >= treeify_threshold_1) // -1 for 1st treeifyBin(TAB, hash); // if (binCount >= treeify_threshold_1) // -1 for 1st treeifyBin(TAB, hash); break; } / / list traversal is to list to find the same node covering the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) resize(); if (++size > threshold) resize(); if (++size > threshold) resize(); // insert into the netlist, afterNodeInsertion(evict); return null; }Copy the code

4.1.1. Treifying: treeifyBin

final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // Key !!!!! If the array length less than the minimum tree capacity (the default) when the array size is 64, use will be the priority array capacity, rather than using the conversion if the red-black tree node (TAB = = null | | (n = TAB. Length) < MIN_TREEIFY_CAPACITY) the resize (); Else if ((e = TAB [index = (n-1) & hash])! = null) { TreeNode<K,V> hd = null, tl = null; do { TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) ! = null); if ((tab[index] = hd) ! = null) hd.treeify(tab); }}Copy the code

There’s one key point to the tree approach

If the array length is less than the minimum tree size (when the default array size is 64), array expansion is preferred over converted red-black tree nodes.

Here is the interviewer very like to ask the point, this dish chicken in the early interview also stepped on the pit, when the light back through ==.

4.1.2. Expansion: ReSIZE

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; Int oldThr = threshold; int newCap, newThr = 0; If (oldCap > 0) {if (oldCap > 0); If (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; } // The maximum value is not exceeded, Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // Calculate the new resize upper limit if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) {// move each bucket to the new bucket for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; If ((e.ash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // put the original index in the bucket if (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // add oldCap to bucket if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

During the capacity expansion, the length of the array will be increased to twice the current capacity. Therefore, the array and the nodes in the linked list will be hashes again. The tail interpolation method is used to avoid the circular linked list that occurs in JDK1.7Hashmap during the resize process. Those who are interested can baidu by themselves.

4.2. The get method

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code
final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; // Ensure that the array is not empty and that the node data on the array to which the hash is located is not empty if ((TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! If (first.hash == hash && // always check first node // If (first.hash == hash && // always check first node // Return ((k = first.key) == key || (key ! = null && key.equals(k)))) return first; // Return if ((e = first.next)! = null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

4.3. Small problem

Here’s a question I like to ask people in interviews. What happens if you override the hashcode method on the key of a Hashmap without overriding the equals method, or vice versa?

4.3.1. Overriding hashCode does not override equals

Overwriting the hashcode method may cause different objects to have the same value of hashcode, which will need to be compared against equals. There will be more hash collisions on the Hashmap, and the number of times the comparison will be queried will be increased.

4.3.2. Overriding equals does not override hashCode methods

The user object has 10 fields. By judging the ID number, I can determine that these two objects are the same person, but the age of this object may be different. The equals method is overridden and the two objects are equal. Hashcode methods may not be equal. In the hashmap, there are also three key objects, which I hope are equivalent replacements, but they are distributed on two different array nodes (a linked list or red-black tree hanging from the array).

In live.

This article has introduced some of the most common ideas in daily work in Hashmap, and here is a brief summary.

**1. Dilatancy factor: ** Dilatancy factor is set to 0.75 by default, which is calculated by the HashMap developers using the Poisson distribution. If the dilatancy factor is too small, it will be expanded too often and waste space. If the expansion factor is too large, hash conflicts occur frequently, which leads to frequent list and tree conversion, and takes a long time to select node elements.

**2. Map traversal: ** In the process of traversal, HashMap records the number of times that HashMap is modified by maintaining modCount, so as to avoid the confusion of hashMap data in the case of concurrent multithreading. The fast-fail mechanism is adopted. ConcurrentHashmap is recommended for simultaneous use of hashMap with multiple threads

** When a list is converted to a red-black tree, a check is performed. When the length of the array is greater than 64 and the number of nodes in the list is greater than 8, the conversion is performed. When the number of nodes in the red-black tree is less than 6, the conversion is performed to a linked list.

**4. Capacity expansion: ** The capacity expansion mechanism is changed from JDK1.7 head insert to tail insert, which avoids the ring linked list problem that may arise during capacity expansion. Each capacity expansion is twice the current capacity.

Contact me

Nailing: louyanfeng25

WeChat: baiyan_lou