From the concept of HashMap, structure, parameters, performance, thread safety, source code parsing (PUT, GET,resize), use scenarios, FAQ eight aspects of analysis.


HashMap in-depth parsing

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key.> (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

Key points: implementation of Map interface, allowing null keys/values, non-synchronization, no guarantee of order (such as insertion order), no guarantee of order over time.

HashMap<String, Integer> map = new HashMap<String, Integer>();
map.put("Chinese".1);
map.put("Mathematics".2);
map.put("English".3);
map.put("History".4);
map.put("Political".5);
map.put("Geography".6);
map.put("Creatures".7);
map.put("Chemistry".8);
for(Entry<String, Integer> entry : map.entrySet()) {
    System.out.println(entry.getKey() + ":" + entry.getValue());
}
Copy the code

A HashMap structure

In terms of structure implementation, HashMap is an array + linked list + red-black tree (JDK1.8 adds red-black tree part) implementation, as shown below.

  • One of the most important fields in the HashMap class is Node[] Table, which stands for hash bucket array. Node is an inner class of HashMap that implements the Map.Entry interface and is essentially a mapping (key-value pair). Each black dot in the figure above is a Node object.

  • Abstract class: AbstractMap

  • Interface: the Map

  • Entities: EntrySet, KeySet, Node, TreeNode, Values

  • Iterators EntryIterator, HashIterator, KeyIterator, ValueIterator

  • Spliterator, EntrySpliterator, HashMapSpliterator, KeySpliterator, ValueSpliterator

A HashMap parameters

There are two important parameters in the HashMap, Capacity and Load factor

Capacity = Capacity

The capacity is the number of buckets in the hash table, The initial capacity is simply The capacity at The time The hash table is created. Capacity is The number of buckets. The default value is 16.

Load factor

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

Load factorIs the maximum proportion to which buckets are full. The default value is 0.75, whenbucketThe number of populations (that is, the number of elements in the HashMap) is greater thanYou need to double the number of buckets.

A HashMap performance

The current implementation of HashMap provides constant-time methods for basic operations such as GET and PUT.

  • If the hash algorithm is very good, then the time complexity of the getKey method isO(1).
  • If the result of the Hash algorithm technique has a lot of collisions, if the Hash is extremely bad, and all the Hash algorithm results in the same index position, then all the key-value pairs are clustered in a bucket, or in a linked list, or in a red-black tree, respectivelyO(n)andO(lgn).

Several performance points to note (spatial/temporal tradeoff)

  1. Suppose the hash function can spread elements among all buckets. From the perspective of Collection, the time required to traverse the HashMap iscount(buckets)+count(bucket.entrySize). If traversal is frequent/iterative performance is high, do not putcapacitySet it too large, don’t turn it onload factorSetting is too small;load factorMore will reduce space consumption, and less will increase time consumption (searching is more time-consuming).
  2. Capacity expansion is a particularly performance intensive operation, so if many key-values pairs are stored in a HashMap instance, give itInitialize a capacity that is large enoughTo avoid frequent resize expansion of the map. It’s more efficient.
  3. The defaultThe load factor = 0.75It’s one for space and time efficiencyBalancing selection, do not change the value of the Load factor, except in the case of special time and space. If the memory space is large and time efficiency is high, you can reduce the value of the Load factor. Conversely, if memory is tight and time efficiency is not high, you can increase the value of the loadFactor, which can be greater than 1.

HashMap thread security

Why is HashMap thread unsafe

Why is HashMap thread unsafe?

  1. Multithreading PUT - hash collision problem: If multiple threads add elements using the put method at the same time, and assuming that there are exactly two put keys that collide (the same bucket as the hash value), the implementation of the HashMap adds both keys to the same location in the array. Eventually, one thread’s PUT will be overwritten;
  2. Multithreading put- loop problemWhen ashMap concurrently executes PUT operations, it causes an infinite loop, resulting in CPU utilization approaching 100%. Because multithreading causes the HashMap’s Node list to form a ring data structure, once the ring data structure is formed, the next Node of the Node is never empty, resulting in an infinite loop during Node fetching. – the Art of Concurrent Programming in Java
  3. Multithreaded resize- Data loss problem: If multiple threads simultaneously detect that the number of elements exceedstable size* loadFactorIn the end, only one thread will assign the expanded array to the table, which means that all the other threads will lose their data and put data will also lose.

HashMap thread safe initialization

In multithreaded usage scenarios, you should avoid using hashmaps that are thread unsafe. There are two ways to achieve thread-safety:

  1. ConcurrentHashMap:Map map = new ConcurrentHashMap<String, Object>();
  2. Collections.synchronizedmap:Map map = Collections.synchronizedMap(new HashMap<String, Object>());

HashMap source code parsing

Hashcode methods

static final int hash(Object key) {
    int h;
    // Perturbation function
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

Perturbation function: right displacement 16 bits, exactly 32bit half, its high half and low half do xor, is to mix the high and low of the original hash code, in order to increase the randomness of the low. In addition, the mixed low level is mixed with some features of the high level, so that the information of the high level is also preserved in a disguised way.

Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, Because many common sets of hashes are already reasonably distributed (so don’t benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

When designing the hash function, since the current table length n is a power of 2, the subscript is computed as follows (ampersand instead of % mod) :

(n - 1) & hash
Copy the code

The designers think this method is prone to collisions. Why do you say that? For example, if n – 1 is 15(0000000000000000 0000000000001111), the only valid bits that actually work are the lower 4 bits.

Therefore, the designers came up with a holistic approach (considering speed, function, and quality) that xOR the high and low 16bits. The designers also explained that since most hashcodes are now well distributed, collisions are done with O(log_n) trees. Only xOR operation, not only reduces the overhead of the system, but also does not cause the collision caused by the high position does not participate in the calculation of the subindex (table length is small).

Put method

	public V put(K key, V value) {
		// Mix the key's hashCode with 16-bit xor
	    return putVal(hash(key), key, value, false.true);
	}
	
    /** * Implements Map.put and related methods */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; 
        Node<K,V> p; 
        //n is the length of the hashtable, and I is
        int n, i;
        // If TAB is empty, the table is initialized
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // compute table array index, bit operation :(n-1)&hash; If the subscript is NULL, a Node is added
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            : / / node key hash value, key memory address consistent | | key is not null & actual entities,
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
             // The chain is a red-black tree
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // The chain is a linked list
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // List length greater than 8 is converted to red black tree for processing
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // Key already exists overwriting value directly
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        // When the load factor*current capacity exceeds the maximum capacity, the system expands the capacity
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

The get method

The general idea of GET is as follows:

  • The first node in the bucket, a direct hit;
  • If there is a conflict, the corresponding entry is searched through key.equals(k)
  • If it is a tree, check key.equals(k) in the tree, O(logn);
  • If it is a linked list, it is searched through key.equals(k), O(n). The implementation of the specific code is as follows:
    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    /** * Implements Map.get and related methods */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
            if((e = first.next) ! =null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null); }}return null;
    }
Copy the code

The resize method

When put, resize occurs if it is found that the bucket is currently occupied by more than the proportion that the Load Factor wants.

If resize is not performed, the data in a single linked list is too much, and the efficiency of get(), PUT (), and remove() methods will be reduced.

In the resize process, it is simply a matter of doubling the bucket, recalculating the index, and placing the node in the new bucket. The comment on resize describes it like this:

Initializes or doubles table size. If null, allocates in accord with initial capacity target held in field threshold. Otherwise, because we are using power-of-two expansion, the elements from each bin must either stay at same index, or move with a power of two offset in the new table. It resize when the limit is exceeded, but since we are using a 2-power extension, the element is either in its original position or moved to a 2-power position.

How do you understand that? For example, when we expand from 16 to 32, the specific changes are as follows:

Is the hash bit added to the original hash value 1 or 0
The original index + oldCap

Evenly distribute the previously conflicting nodes into the new bucket

Resize the source code

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null)?0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
	        // It will not be expanded beyond the maximum value
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // If the value is not higher than the maximum value, the value is doubled
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // Calculate the new resize upper limit
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes"."unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if(oldTab ! =null) {
            // Move each bucket to the new buckets
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
	                     // List optimizes code blocks for rehash
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            / / the original indexes
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // Old index +oldCap
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                         // Put the original index in the bucket
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // Add oldCap to bucket
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

Usage scenarios

A HashMap is an implementation based on the Map interface that stores key-value pairs. It can accept null key values and is asynchronous. A HashMap stores Entry(Hash, key, Value, Next) objects.

Memory cache

Q&A

Differences between HashMap and Hashtable

Hashtable: Hashtable is a legacy class. Many common functions of HashMap are similar to those of Dictionary class, but it is thread-safe. Only one thread can write Hashtable at any time. ConcurrentHashMap introduces segmental locking. The use of Hashtable is not recommended in new code, and can be replaced with HashMap in cases where thread safety is not required, and ConcurrentHashMap in cases where thread safety is required.

HashMap implements all of the map class methods. And allows null as key and value. HashMap and HashTable are roughly the same. The difference is that HashMap is not synchronous and allows NULL values, whereas HashTable is synchronous and does not allow NULL values. A HashMap does not guarantee that the elements you put will be stored in order. It also does not guarantee that the current storage order will last long.

Why Buckets in a HashMap have an index of 2? Unsampled prime?

In a HashMap, the length of the hash bucket array table must be 2 to the power of n (must be composite), which is an unconventional design. The conventional design is to design the bucket size as a prime number. Relatively speaking, the probability of conflicts caused by prime numbers is smaller than composite numbers. The initial bucket size of Hashtable is 11, which is an application whose bucket size is designed as a prime number (The Hashtable cannot be guaranteed to remain prime after expansion).

The unconventional design of HashMap is mainly for optimization in modulus taking and expansion. At the same time, in order to reduce conflicts, HashMap also adds high order to participate in the operation process when locating the hash bucket index position.

How to handle hash collision in HashMap?

In order to solve hash table collision, open address method and chain address method can be used to solve the problem, and the chain address method is used in Java HashMap.

Chain address method

Simply put, it’s an array plus a linked list. Each array element has a linked list structure. When the data is hashed, the array subscript is obtained and the data is placed on the linked list of the corresponding subscript element.

Open address method

  • Linear open address methodLinear open address method is after hash, when it is found that a variable already exists in the position, put it in the next position, if the next position also conflicts, continue down, and so on, until it finds the position with no variable, put it in.
  • Square open address methodIf there is a conflict in the correct position after hash, do not put it in the next position, but put it in the second ^0 position. If the conflict continues, put it in the second ^1 position, then 2^3… Until you find a place that doesn’t conflict.
  • Double hash open address methodDouble hash is the same as above, except that instead of placing it in the 2^ position, place it in the key-hash (tablesize) position. If a conflict persists, place it in the key-hash (tablesize) position.

What happens if you still get frequent collisions? The authors note that we use trees to handle large sets of collisions in bins. In JEP-180, we describe the problem:

Improve the performance of java.util.HashMap under high hash-collision conditions by using balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

As mentioned earlier, there are basically two steps to getting a HashMap element:

  • First hash the bucket index based on hashCode();
  • If the key of the bucket node is not the one we want, we can search the chain through keys.equals().

Prior to Java 8, conflicts were resolved using linked lists. In the case of collisions, the two-step time complexity of GET is O(1)+O(n), where n is bucket.size. So, n is big when the collision is big, and the speed of O(n) obviously affects the speed.

Therefore, in Java 8, the list is replaced by a red-black tree, so that the complexity becomes O(1)+O(logn). This is an ideal way to solve this problem when n is large, as shown in the performance test in Java 8: HashMap Performance Improvement.

Is the complexity of HashMap queries always O(1)?

Before Java8, the worst is O(1)+O(n/backetSize) Java8, O(1)+O(log(backetSize))

Do you know how HashMap works?

Objects are stored and retrieved using the hash method through put and get.

  • When storing objects (put), we pass K/V to the put method, which calls hashCode to hash out the location of the bucket. Further storage, the HashMap will automatically resize the bucket based on the current bucket usage (if the Load Facotr exceeds the size of the original resize).
  • When you get an object (get), we pass K to the get method, which calls hashCode to compute the hash to get the bucket position, and further calls equals() to determine the key-value pair. In Java 8, if the number of colliding elements in a bucket exceeds a certain limit (the default is 8), a red-black tree is used to replace the linked list to speed things up.

Do you know how get and put work? What do equals() and hashCode() do?

Hash hash hash hash hash hash hash hash hash hash hash hash hash If a collision occurs, the key-.equals () method is used to find the corresponding node in the linked list or tree

Do you know the implementation of Hash? Why is it implemented this way?

In the Java 1.8 implementation, this is done through the high 16 bits xor low 16 bits of hashCode() : (h = k.hashcode ()) ^ (h >>> 16), mainly from the speed, utility, quality, so that when bucket n is small, it can also ensure that the high and low bits are involved in the hash calculation, and there is no too much overhead.

What if the size of the HashMap exceeds the capacity defined by the Load factor?

If the load factor is exceeded (0.75 by default), a HashMap twice as long is resized and the hash method is re-invoked.

Basic knowledge of

Java operator.

A = 0011 1100 B = 0000 1101 & (bit operation) : the result is 1 if the corresponding bits are all 1, otherwise 0. For example, A&B=0000 1100 ^ (xor operation) : the result is 0 if the corresponding bit values are the same, otherwise it is 1. For example, (A ^ B) =0011 0001 >>> (bitwise shift right zeroing operator) : The value of the left operand moves right according to the number of bits specified in the right operand, and the resulting space is filled with zeros. If A > > > 2 = 0000, 1111

reference

  • The Java 8 series reimagines HashMap
  • Hash conflict handling: chain address method
  • A HashMap visualization