What is HashMap mainly used for? Why would you do that?

We’ve all used hashMap many times, mainly to speed up our lookups. If you’ve learned about data structures, you can query and modify an array very quickly, but add an element or delete an element very slowly, whereas a linked list is the other way around, add and delete an element very quickly, query and modify very slowly. In general, to speed up the query, we need to define rules when inserting elements, which adds complexity. If you don’t understand, you can look up this chapter in the algorithm book. Hashmap is a data structure that combines arrays and linked lists to increase query speed. Accordingly, we need to understand the PUT operation of HashMap.

Figure out the data structure of the HashMap

We said earlier that a HashMap is a combination of an array and a linked list.

So if you look at this picture, each element of the array is actually a linked list. We all know that a hashMap is a key-value pair lookup when used. Ideally, there should be only one node for each element of the array. This is the fastest efficiency. Because that’s almost like looking up an element in an array.

But if the face is bad, or the hash algorithm is bad. The rest of the array is empty, and all the data we insert is in one place:

It’s going to be slow. O (1) of the former and o(n) of the latter.

As you can see, our HashMap allows a null key, but only one element can have a null key, which is allowed but not encouraged.

Understand the load factor

   /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final floatDEFAULT_LOAD_FACTOR = 0.75 f;Copy the code

By default, when we construct a hashmap, the default size is an array of 16 elements.

So what does DEFAULT_LOAD_FACTOR do?

We all know that if our hashMap has an array size of 16, and our data is exactly 16, the best case is that the list for each element has only 2 nodes. If you want to insert 32 data, the best result is a linked list with 2 nodes. If 128 data were inserted, a linked list would have 8 nodes. It’s definitely going to slow down. To speed up our lookups, we should obviously expand the array size, so when do we expand the array size? When the number of elements is greater than the size of the array times the load factor, we have to double the size of the array. For example: 1. The default array size is 16. 2.

Why the load factor is 0.75, I don’t know… You know there’s such a thing. 0.75 should be the best algorithm.

What is the meaning and purpose of extending the array length?

Earlier, we said that increasing the size of the array increases the speed of finding data. For those of you who don’t understand this well enough, here’s a simple example to help you understand it.

  1. Let’s say our array is only 2 in length and we want to insert 5,7, and 9 elements. Hash algorithm we take the modulo of the element value and the array length.

  2. So, of course, if you modulo 579 and 2, and they’re both 1, then 5, 7, and 9 will be inserted at position 1 of this array of length 2.

  3. Therefore, it is obviously slow for us to find the three elements 579, because we have to go to the linked list corresponding to the position A [1]. So let’s start increasing the array size

  4. If the length of the array is 4, then the values of 579 are 1,3,1, and obviously 5 and 9 are placed at a[1]. The 7 is placed in position A [3].

  5. After modulo 8, the corresponding values are 5,7,1. Obviously, the corresponding positions of 5,7, and 9 elements are A [5], A [7], and a[1]. At this point, our search efficiency is significantly improved compared with that of the array with the length of 2.

This process is called a rehash

Understand the source code for hashMap in JDK7

As mentioned earlier, for a HashMap, the PUT operation is the most important. Let’s take a look at the JDK7 put operations

/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> returncan also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) {// Construct an array if the array is emptyif(table == EMPTY_TABLE) { inflateTable(threshold); } // Special handling is required if the key is nullif (key == null)
            returnputForNullKey(value); // We can calculate one from the value of the key passed inhashThe value of the inthash = hash(key); / / usehashInt I = indexFor(int I = indexFor(hash, table.length); // See if there is a key in the list at the index position of the array that is equal to the key passed in. If so, replace it and return the old valuereturnSo the function ends up herefor(Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k;if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                returnoldValue; } } modCount++; // If this does not happen, a new element is added.hash, key, value, i);
        return null;
    }
    
    
     /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2"; //h is calculatedhashValue, length is the length of the array and this is actually a modulo operation. // This is the same as h%lengthreturn h & (length-1);
    }
    
    
    void addEntry(int hash, K key, V value, int bucketIndex) {// bucketIndex (K key, V value, int bucketIndex)hashValue to be recalculated // recalculatedhash// Threshold is the length of the array * the load factorif((size >= threshold) && (null ! = table[bucketIndex])) { resize(2 * table.length);hash= (null ! = key) ?hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }
        
        createEntry(hash, key, value, bucketIndex); Void resize(int newCapacity) {Entry[] oldTable = table; int oldCapacity = oldTable.length;if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return; Entry[] newTable = new Entry[newCapacity]; Transfer (newTable, initHashSeedAsNeeded(newCapacity)); transfer(newTable, initHashSeedAsNeeded(newCapacity)) // Then assign the index of the new array to table = newTable; Threshold = (int) math. min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); Void Transfer (Entry[] newTable, Boolean) void Transfer (Entry[] newTable, Booleanrehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null ! = e) { Entry<K,V> next = e.next;if (rehash) { e.hash = null == e.key ? Zero:hash(e.key); } int I = indexFor(e.hash, newCapacity);} int I = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}} // We insert the new element at the head of the list, and then make the next pointer to the new element point to the previous list head, so jdk7 inserts elements at the head of the list. // When we recalculate the index in the transfer function, we take an element from the head of the old list and place it in the index of the new array. // Add the index to the list of the new array. // This is a result: Since we insert each element at the head of the list, if the new array is constructed and our indexes are still equal, the two elements will still be in the same list, but the ones that were at the head are now at the tail. Void createEntry(int); void createEntry(int)hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }


Copy the code

Jdk8 is different from JDK7

There are two main points.

1. Hash collision, which means if the hash values are the same, don’t we put them in a linked list? Jdk7 inserts new elements in the header, jdk8 inserts new elements in the tail.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else{ Node<K,V> e; K k; / / ifhashIf it's equal, the key is equal so I'll just take the reference and I'll just replace the old valueif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p;else if(p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab,hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if((e = p.ext) == null) {// It should be obvious that the new element is at the end of the list, not at the head of the jdk7 type.hash, key, value, null); If the list length is greater than 8, the red-black tree problem is a data structure problem, which we will discuss laterif (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }}if(e ! = null) { // existing mappingfor key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

Copy the code
  1. The second and biggest difference is that jdK8’s resize() operation is much faster than JDK7’s, and the JDK7 resize list is reversed, while jdK8 does not.

    Consider the following scenario:

    For an array of initial length 16, we add two elements: one hashcode is 5 and one hashcode is 21. After mod 16, the computed index position 5 corresponds to a[5] and 21 corresponds to a[5], so the two are in the same linked list.

    When the number of elements exceeds the threshold, the array expands to 32, at which point 5 and 21 mod 32. 5 still corresponds to a[5], and 21 corresponds to a[21]. The previous corresponding position was doubled after a[5] was expanded by 16 at position 21. This 16 is actually the extended length, and the mathematical formula can be abstracted as

    OldIndex The oldIndex position is either the same or oldIndex+ the oldIndex length.

    With these mathematical rules, we can optimize resize without having to recalculate the hash and index as in JDK7.

    The following is the main process of jdK8 resizes

if(oldTab ! = null) {for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! = null) { oldTab[j] = null;if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    elseLoHead = null, loTail = null; // preserve order // Preserve order = / // Add a bit of 1 to the list. Node<K,V> hiHead = null, hiTail = null; Node<K,V> next;do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null);if(loTail ! = null) { loTail.next = null; newTab[j] = loHead; }if(hiTail ! = null) { hiTail.next = null; NewTab [j + oldCap] = hiHead; } } } } }Copy the code

Hashmap in multithreading

It is well known that hashmap obviously does not support thread synchronization. The biggest reason is that the resize process of Hashmap is easily interfered by threads. It is likely that the execution order of transfer operation in the middle will be disrupted. It’s very easy to have a circular list like you’re pointing at me and I’m pointing at you, and once you have a circular list, the basic program is going to be an infinite loop and you’re going to get an error.

Even if this extreme situation does not occur, if the put operation is not locked, arbitrarily changing the value will cause the get and put to be inconsistent.

Our hashtable takes the extreme approach of locking the PUT method directly. Although this is once and for all, but very low efficiency. Not recommended.

Of course we can do it the other way.

HashMap hm=new HashMap();
Collections.synchronizedMap(hm);
Copy the code

This also keeps the threads in sync. Take a look at the principle:

public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
        return new SynchronizedMap<>(m);
    }

    /**
     * @serial include
     */
    private static class SynchronizedMap<K,V>
        implements Map<K,V>, Serializable {
        private static final long serialVersionUID = 1978198479659022715L;

        private final Map<K,V> m;     // Backing Map
        final Object      mutex;        // Object on which to synchronize

        SynchronizedMap(Map<K,V> m) {
            this.m = Objects.requireNonNull(m);
            mutex = this;
        }

        SynchronizedMap(Map<K,V> m, Object mutex) {
            this.m = m;
            this.mutex = mutex;
        }

        public int size() {
            synchronized (mutex) {returnm.size(); } } public booleanisEmpty() {
            synchronized (mutex) {returnm.isEmpty(); } } public boolean containsKey(Object key) { synchronized (mutex) {returnm.containsKey(key); } } public boolean containsValue(Object value) { synchronized (mutex) {returnm.containsValue(value); } } public V get(Object key) { synchronized (mutex) {returnm.get(key); } } public V put(K key, V value) { synchronized (mutex) {returnm.put(key, value); } } public V remove(Object key) { synchronized (mutex) {returnm.remove(key); } } public void putAll(Map<? extends K, ? extends V> map) { synchronized (mutex) {m.putAll(map); } } public voidclear() { synchronized (mutex) {m.clear(); } } private transient Set<K> keySet; private transient Set<Map.Entry<K,V>> entrySet; private transient Collection<V> values;Copy the code

Better than HashTable, but still too many locks. Improved but still not good enough.

ConcurrentHashMap is the ultimate solution to this problem.

In short, the main reason ConcurrentHashMap is more efficient than any of the above schemes is that

We can take hashMap as a collection of banks, such as INDUSTRIAL and Commercial Bank of China, China Construction Bank, China Merchants Bank, Agricultural Bank of China, and so on.

The first two are almost as long as you deposit money, no matter which bank you want to go to, you have to wait in line.

ConcurrentHashMap’s granularity is reduced to such that if you deposit money, you will only queue up at the bank you want to visit. Significantly more efficient.

To understand the ConcurrentHashMap source code, we need to understand the volatile transient keyword and the ReentrantLock mechanism. ConcurrentHashMap will be discussed later. You can understand it.

Summary of experience when using

(1) HashMap: It stores data according to the hashCode value of the key. In most cases, its value can be directly located, so it has fast access speed, but the traversal order is uncertain. A HashMap allows at most one record to have a null key and multiple records to have a NULL value. A HashMap is not thread-safe, meaning that more than one thread can write a HashMap at any one time, which can lead to data inconsistencies. If thread-safe is desired, the Collections synchronizedMap method can be used to make HashMap thread-safe, or ConcurrentHashMap can be used.

(2) the Hashtable: Hashtable is a legacy class. Many common functions of HashMap are similar to those of Dictionary class, but it is thread-safe. Only one thread can write Hashtable at any time. ConcurrentHashMap introduces segmental locking. The use of Hashtable is not recommended in new code, and can be replaced with HashMap in cases where thread safety is not required, and ConcurrentHashMap in cases where thread safety is required.

(3) LinkedHashMap: LinkedHashMap is a subclass of HashMap, which preserves the insertion order of records. When Iterator traverses LinkedHashMap, the records obtained first must be inserted first. It can also be constructed with parameters and sorted according to the access order.

(4) TreeMap: TreeMap implements the SortedMap interface, which can sort the records it saves according to the key. The default is the ascending order of the key value. You can also specify the comparator for sorting. If sorted mappings are used, TreeMap is recommended. When using a TreeMap, key must implement the Comparable interface or in constructing TreeMap incoming custom Comparator, otherwise it will throw at runtime Java. Lang. ClassCastException type of exception.