Introduction of a HashMap

HashMap after JDK1.8 uses array + linked list/red-black tree in the underlying data structure, and stores key-value pair data through hash mapping. Because hashcode is used in the query, the query access speed is faster. A HashMap can store null keys and values, but there can only be one key for NULL and multiple values for NULL. It is unordered and not thread-safe.

HashMap underlying data structure

The underlying HashMap in JDK1.7 and earlier is made up of arrays plus linked lists, known as linked list hashes. Arrays are the body of a HashMap, while linked lists exist primarily to resolve hash conflicts. After JDK1.8 HashMap in solving the hash conflict with the larger changes, when the chain length of the table is greater than the threshold value (the default is 8) (will list is converted to a red and black tree before judgment, if the current is less than 64 the length of the array, then choose to array capacity, rather than directly into red and black tree), the list into the red-black tree, To reduce search time.

Underlying implementation of HashMap

Through key hashcode after dealing with the disturbing function get hash value, and then through the (n – 1) & hash judge the position of the current element to deposit (in this case, the n is the length of the array), if the current position element exists, it concludes that the element and the element of deposit to the hash value and whether the key is the same, If the same, directly overwrite; Different to solve the conflict through the zipper method.

To prevent some poorly implemented hashcode() methods, in other words, the perturbation function can reduce collisions. Zipper method: Combine linked lists and arrays. Create an array of linked lists, where each cell is a linked list. If a hash conflict is encountered, add the value of the conflict to the linked list.

Why is the length of a HashMap a power of 2

The hash value ranges from 2147483648 to 2147483647, and there are about 4 billion mapping space. As long as the hash function mapping is uniform and loose, collisions are difficult to occur. But an array of 4 billion won’t fit in memory. Therefore, the hash value cannot be used directly. The length of the array needs to be modulo, and the remainder is used as the location of the storage, which is the corresponding array subscript. This modulo operation evaluates to (n-1)&hash (n is the length of the array).

At this time someone wants to ask: take mold operation is not %? Reason: The binary operation & is more efficient than %. Therefore, if the divisor of mod (%) is a power of 2, it is equivalent to the ampersand (&) of its divisor minus one, that is, hash%length==hash&(length-1), which is true only if length is 2 to the n.

Expansion mechanism of HashMap

To understand the HashMap expansion process, you first need to understand the variables in the HashMap

Node<K,V> : linked list Node, including key, value, hash, next pointer four elements; Table: an array of type Node<K,V> containing a linked list of HashMap elements; Size: indicates the number of key/value pairs in the current HashMap final float loadFactor: a loadFactor used to expand the capacity int threshold: Table * loadFactor static final float DEFAULT_LOAD_FACTORY = 0.75f: loadFactor static final float DEFAULT_LOAD_FACTORY = 0.75f: Default load factor static final int DEFAULT_INITIAL_CAPACITY = 1<<4: default table initial capacity (16) static final int TREEIFY_THRESHOLD = 8: Statci Final int UNTREEIFY_THRESHOLD = 6: If the number of nodes in the tree is less than or equal to this parameter, the tree is converted to a linked listCopy the code

When will the capacity be expanded?

If the current capacity of stored data exceeds threshold, you need to expand the capacity by resize. The threshold depends on two factors: the current length capacity and the load factor loadFactory.

threshold = capacity * loadFactory

Copy the code

HashMap expansion can be divided into three cases

The first: initialize the HashMap using the default constructor. A HashMap initially returns an empty table with a threshold of 0. Therefore, the default capacity is DEFAULT_INITIAL_CAPACITY = 1<<4. Therefore, threshold = DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR = 12. Second: construct a HashMap with a method that specifies the initial capacity. The initial capacity is equal to threshold, followed by threshold = current capacity (threshold) * DEFAULT_LOAD_FACTOR. Third: HashMap is not the first expansion. If the HashMap has been expanded, the capacity of each table and threshold will be doubled. Node<K,V>[] resize(

/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[]  oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; int oldThr = threshold; Int newCap, newThr = 0; If (oldCap > 0) {// The table has been expanded. // Return the current table if the current table capacity exceeds the maximum value. If (oldCap >= MAXIMUM_CAPACITY) {threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) NewThr = oldThr << 1; Double threshold} else if (oldThr > 0) // Initial capacity was placed in threshold Table capacity: threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; If (e.ext == null) // If (e.ext == null) // If (e.ext == null) // If (e.ext == null) // If (e.ext == null) // If (e.ext == null) // If (e.ext == null) // So newCap = 2^n newTab[e.hash & (newcap-1)] = e; Else if (e instanceof TreeNode) // If (e instanceof TreeNode) (TreeNode<K,V>)e).split(this, newTab, j, oldCap); LoHead = null, loTail = null; loTail = null; loHead = null; loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; If ((e.hash & oldCap) == 0) {// If (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; NewTab [j + oldCap] = hiHead; } } } } } return newTab; }}Copy the code

The difference between HashMap and HashSet and HashTable

A HashMap and HashSet

The underlying implementation of HashSet is based on HashMap and is non-thread-safe

A HashMap and HashTable

ConcurrentHashMap

JDK1.7

Segment array +HashEntry array + linked list

ConcurrentHashMap Segmented locks segment the entire bucket array. Each lock locks only one part of the data in the container. Multi-threaded access to different data segments in the container eliminates lock competition and improves concurrent access rates.

JDK1.8

In JDK1.8, the concept of segment has been abandoned. Instead, Node arrays + linked lists + red-black trees have been implemented. Using both CAS and synchronized for concurrency safety, the whole thing looks like an optimized, thread-safe HashMap. Synchronized only locks the first node of the current linked list or red-black binary tree. In this way, as long as hash does not conflict, concurrency will not occur and efficiency will be improved by N times. Summary of the underlying data structure of the collection framework

So how do you choose sets?

For example, when we need to obtain the element value according to the key value, we choose the set under the Map interface; when we need to sort, we choose TreeMap; when we do not need to sort, we choose HashMap; and when we need to ensure thread safety, we choose ConcurrentHashMap. When we only need to store element values, we choose the Collection that implements the Collection interface. When we need to ensure that elements are unique, we choose the Collection that implements the Set connection, such as TreeSet or HashSet. You don’t need to choose one that implements the List interface, such as ArrayList or LinkedList, and then choose one based on the characteristics of the collection that implements those interfaces.

The last

At the end of the article the author sorted out a lot of information for you! Including Java core knowledge + a full set of architect learning materials and video + first-line factory interview treasure dictionary + resume template interview ali Meituannetease Tencent Xiaomi IQiyi Quick hand bilibili bilibili interview questions +Spring source code collection +Java architecture actual combat e-books and so on! Friends in need can pay attention to the public account: future youguang reply to automatic download information