Personal understanding of Hashmap

Just saw someone blowing Hashmap in QQ group, I thought I didn’t understand anything, I quickly added a wave. Here’s my understanding of Hashmap, mainly for personal memos. If something is wrong, please criticize. Want to unlock more new poses? Please visit http://blog.tengshe789.tech/

Since the total

A Hashmap is a hash table with a key-value pair storage structure. Store data based on the key’s Hashcode value for faster access.

Its threads are not safe. When two threads attempt to expand the HashMap at the same time, it is possible to form a linked list into a circular list, with all next not empty and entering an infinite loop. To make it safe, HashMap can be made thread-safe using the Collections synchronizedMap method, or ConcurrentHashMap.

His key-value pairs can be empty, and the mapping is not ordered.

Hashmap has two parameters that affect performance: initial capacity and load factor.

Hashmap stores the structure

In JDK1.8 Hashmap is implemented by linked lists, red-black trees, and arrays

Static class Node<K,V> implements map. Entry<K,V> {final int hash; static class Node<K,V> implements Map. // Save the Hash final K key of the node; // Save the node key value V value; // Save the Node value Node<K,V> next; // Pointer to the next Node of the linked list or red-black tree Node(int hash, K key, V value, Node<K,V> next) {this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; }}Copy the code

Hashmap constructor

A HashMap has four constructors.

Code:

Public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); } public HashMap(int initialCapacity) {this(initialCapacity, DEFAULT_LOAD_FACTOR); } // No parameter construction. HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted} Public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); }Copy the code

Hashmap variable member

// The initial capacity of the array when no capacity is specified. Initial capacity is 16 // Why not just write 16? Because it's fast. The computer has to convert binary. Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 // Load factor. (resize ()) static final float DEFAULT_LOAD_FACTOR = 0.7f; Static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; // When the list is converted to a red-black tree, the size of the array needs to be determined. If the array capacity is too small and there are too many hash conflicts, the static final int MIN_TREEIFY_CAPACITY = 64 is used instead of the red-black tree operation.Copy the code

Initial capacity, load factor, and threshold.

In general, a HashMap is created using the no-parameter constructor. However, when we have time and space complexity requirements, using the default value may not meet our requirements, in which case we need to manually tune parameters.

In the HashMap constructor, we can adjust two parameters, initialCapacity and loadFactor. By setting these two parameters, you can further influence the threshold size. However, the initial threshold is only calculated by initialCapacity through the shift operation.

InitialCapacity initialCapacity of the HashMap loadFactor loadFactor threshold maximum number of key/value pairs that a HashMap can hold. If the number exceeds the threshold, the HashMap needs to be expanded

By default, the HashMap has an initial capacity of 16 and a load factor of 0.75. Note that the threshold can be calculated by multiplying the capacity by the loadFactor, that is, threshold = capacity x loadFactor

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

This code is a little difficult, but according to The big god, this method means finding a minimum power of two greater than or equal to cap. Let’s take a look at the tableSizeFor diagram:

The capacity in the figure is 229+1. After calculation, it is 230

To quote the great God:

For a HashMap, the load factor is an important parameter that reflects the usage of the HashMap bucket array (assuming that the key-value pairs are evenly distributed). By adjusting the load factor, a HashMap can behave differently in both time and space complexity. When we turn down the load factor, the HashMap can hold fewer logarithms of keys. When you re-store key-value pairs in a new bucket array, collisions between keys decrease and the list becomes shorter. At this point, the efficiency of HashMap operations such as add, delete, change and check will become higher. Here is a typical space for time. Conversely, if you increase the load factor (which can be greater than 1), the number of key-value logarithm that a HashMap can hold increases, resulting in high space utilization, but also a high collision rate. This means that as lists get longer, they become less efficient, in the case of time for space. As for how to adjust the load factor, this depends on the use scenario. In general, we use the default values.

Insert the PUT

Process:

Compute the hash value for the Key, and then compute the subscript
If there is no collision, it goes into the bucket
If they collide, we put them in a linked list
If the list length exceeds the threshold, the list is converted to a red-black tree
Replace the old value if the linked list exists
If the bucket is full (capacity x load factor), resize

Public V put(K key, V value) {return putVal(hash(key), key, value, false, true); }

putVal

The core algorithm is in putVal (). To understand this, you need to understand Bucket Sort.

It is by far the fastest sorting method, and its time complexity is only two o (n), that is, linear complexity.

The core idea of bucket sorting is: m intervals of the same size are divided according to data size N (each interval is a bucket, which can be understood as a container). Will n elements in accordance with the prescribed scope distribution into the bucket, then for each barrel of the elements in the sorting, sequencing method can be according to the needs, choose the quick sort, or merge sort, or insertion sort, and then in turn remove elements from each barrel, sequentially into the initial output sequence (the equivalent of all the elements together in barrels).

Here’s the code:

Final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {//n is the length of the array Node<K,V>[] TAB; Node<K,V> p; int n, i; / / determine whether bucket array is empty if ((TAB = table) = = null | | (n = TAB. Length) = = 0) / / use the resize () to initialize the n = (TAB = the resize ()). The length; // Insert the node into the array according to the hash value. // Insert the node into the array if there are no elements. (n-1) &hash, because &emsp; N must be a power of two, Hash % n if ((p = TAB [I = (n-1) & hash]) == null) // add a newNode to the bucket TAB [I] = newNode(hash, key, value, null); Else {// Record the temporary variable e. If there is a value, it simply overwrites the value. Node<K,V> e; K k; / / if the key in the hash values, and node is equal to the first key values in the list of nodes, will point to the key/value pair if e (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; Else if (p instanceof TreeNode)// If the bucket reference type is TreeNode, E = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); For (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); If (binCount >= treeify_threshold-1) // -1 for 1st treeifyBin(TAB, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; }} // If (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); // return oldValue; } } ++modCount; If (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

HASH

Hash algorithm, xor operation of high and low hexadecimal, the advantage of this is that the result will be as different as possible.

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

resize

The capacity expansion mechanism of a HashMap is different from that of other varium-length sets. The HashMap is expanded by twice the length of the current bucket array, and the threshold is also doubled. (If the threshold overflow returns to zero during calculation, the threshold formula is used to recalciate.) After expansion, the key/value pairs are recalculated and moved to their proper positions.

Resize has done three things in total:

Calculates the capacity of the new bucket array, newCap, and the new threshold, newThr
Create a new bucket array based on the calculated newCap. This is where the bucket array table is initialized
Remap the key-value pair node to the new bucket array. If the node is of TreeNode type, you need to split the bonus black tree. If it is a common node, the nodes are grouped in the same order.

//resize () is called when size > threshold final Node<K,V>[] resize() {Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap > 0) {// If (oldCap > 0) { If (oldCap >= MAXIMUM_CAPACITY) {// Set the threshold to the maximum Integer value threshold = integer.max_value; return oldTab; Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold} else if (oldThr > 0) // Initial capacity was placed in threshold Creates a HashMap on behalf of the user, But the constructors used are * HashMap(int initialCapacity, float loadFactor) or HashMap(int initialCapacity) * or HashMap(Map<? extends K, ? Extends V> m) causes oldTab to be null, oldCap to be 0, * oldThr to be the initial capacity of the user-specified HashMap */ newCap = oldThr; Else {// Set new capacity and new threshold size newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // If (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // Prepare the initialization process threshold = newThr; @SuppressWarnings({“rawtypes”,”unchecked”}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) {// iterate. ReHash oldTab into newTab for (int j = 0; j < oldCap; ++j) { Node<K,V> e; If ((e = oldTab[j])! OldTab [j] = null; If (e.next == null) newTab[e.hash & (newcap-1)] = e; Else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// Preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; // Walk through the list and group the list nodes in the same order do {next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; // oldCap newTab[j + oldCap] = hiHead; } } } } } return newTab; }

When is the time complexity of a HashMap O (1), O (n) and O (logn)

O (1) : The length of the list should be as short as possible. Ideally, the length of the list should be 1

O (n) : When Hash conflicts are serious, if there is no red-black tree, the linked list formed on the bucket will become longer and longer, resulting in lower query efficiency. Time complexity is O(N).

O(logn) : Use red-black tree to ensure query efficiency O(logn)

handwritten

/** * @author tengshe789 */ public class HashMap {public class Node<K,V>{K key; V value; Node<K,V> next; public Node(K key, V value, Node<K, V> next) { this.key = key; this.value = value; this.next = next; } public K getKey() { return this.key; } public V getValue() { return this.value; } public V setValue(V value) { this.value=value; return this.value; }} public static class HashMap<K, V>{/* Array <K,V>[] array=null; /* Private static int defaultLength=16; /* private static double factor= 0.75d; /* Private int size; / * * / print function public void print () {System. Out. Println (" = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = "); if(array! =null) { Node<K, V> node=null; for (int i = 0; i < array.length; i++) { node=array[i]; System. Out. Print (subscript "/" + I + ""); // iterate over the list while(node! =null) { System.out.print("["+node.getKey()+":"+node.getValue()+"]"); if(node.next! =null) { node=node.next; }else {// to the tail element node=null; } } System.out.println(); Public V put(K K, V V) {//1. If (array==null) {array=new Node[defaultLength]; } //2. Use the hash algorithm to calculate the exact insertion position int index=position(k,defaultLength); / / expansion. If (size > defaultLength*factor) {resize(); if(size > defaultLength*factor) {resize(); } //3. Insert the element Node<K, V> Node =array[index]; if(node==null) { array[index]=new Node<K,V>(k,v,null); size++; }else { if(k.equals(node.getKey()) || k==node.getKey()) { return node.setValue(v); }else { array[index]=new Node<K,V>(k,v,node); size++; } } return null; } private void resize() {// Double the size //1. Create a temporary array variable equivalent to defaultLength *2 Node<K, V>[] temp=new Node[defaultLength << 1]; //2. Recalculate the hash value and insert it into the new array. code=key % defaultLength ==> code=key % defaultLength*2 Node<K, V> node=null; for (int i = 0; i < array.length; i++) { node=array[i]; while(node! =null) {// rehash int index=position(node.getKey(),temp.length); // Insert the header Node<K, V> next = node.next; //3 node.next=temp[index]; //1 temp[index]=node; //2 node=next; }} //3. Replace the old array. defaultLength=temp.length; temp=null; } private int position(K k,int length) { int code=k.hashCode(); // return code % (length-1); //return code & (defaultLeng-1); } public V get(K k) { if(array! =null) { int index=position(k,defaultLength); Node<K, V> node=array[index]; // iterate over the list while(node! =null) {// Return value if(node.getKey()==k) {return node.getValue(); } else // Go to the next element if the key is different {node=node.next; } } } return null; } } public static void main(String[] args) { HashMap<String, String> map=new HashMap<String, String>(); The map. The put (" 001 ", "001"); The map. The put (" 002 ", "002"); The map. The put (" 003 ", "003"); The map. The put (" 004 ", "004"); The map. The put (" 005 ", "005"); The map. The put (" 006 ", "006"); The map. The put (" 007 ", "007"); The map. The put (" 008 ", "008"); The map. The put (" 009 ", "009"); The map. The put (" 010 ", "010"); The map. The put (" 011 ", "011"); map.print(); System. The out. Println (" = = = = = = = = > "+ map. Get (" 009")); }}Copy the code

The resources

coolblog

Ali architect takes you through the implementation principle of HashMap source code

Thank you!

Here’s from me after n days:

Add to see a very good: click on the link, worth learning

Want to learn more about awesome new poses? Please visit my personal blog this post was originally published on my personal blog and was later published by CSDN, SegmentFault and Juejin. If there are similarities, that is fate ~

Related Posts

Deploying a Ceph Nautilus cluster in CentOS7 – a dumbass tutorial

Alibaba Industrial Internet platform “Thinking” : a transformation from 0 to 1

The distributed crawler management platform Crawlab V0.6.0 Beta has been released