Does the collection class HashMap, HashTable, ConcurrentHashMap differ?

1.HashMap

Simply put, HashMap consists of array + linked list. Array is the body of HashMap, while linked list is mainly used to solve hash conflicts. If the array location does not contain a linked list (the next of the current entry points to NULL), search, add and other operations will be very quick and only need one address. If the array to be located contains a linked list, the time complexity for the add operation is still O(1) because the latest Entry is inserted into the head of the list and simply changes the reference chain, whereas for the lookup operation, it is necessary to traverse the list and then compare the search through the Equals method of the key object. Therefore, for performance purposes, the fewer linked lists in a HashMap, the better the performance.

Hash function (further calculation of the key’s hashcode and adjustment of the binary bits to ensure that the final storage location is as evenly distributed as possible)

static final int hash(Object key) {

int h;

return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

}

Lookup function

Hashmap has been modified in JDK1.8, as shown below

Before JDK 1.8, the implementation of HashMap was array + linked list, even if the hash function is good, it is difficult to achieve 100 percent uniform distribution of elements.

When a large number of elements in a HashMap are stored in the same bucket and there is a long linked list under the bucket, the HashMap is equivalent to a single linked list. If the single linked list has N elements, the traversal time complexity is O(n), completely losing its advantage.

In response, JDK 1.8 introduced red-black trees (search time complexity O(logn)) to optimize this problem

2.HashTable

Hashtable contains several important member variables: table, count, threshold, loadFactor, modCount.

Table is an Entry[] array type, and Entry (as explained in HashMap) is really just a one-way linked list. A hash table’s key-value pairs are stored in an Entry array.
Count is the size of the Hashtable, which is the number of key-value pairs that the Hashtable holds.
Threshold indicates the threshold of the Hashtable, which is used to determine whether to adjust the Hashtable capacity. The value of threshold = capacity x load factor.
LoadFactor is the loading factor.
ModCount is used to implement the fail-fast mechanism.

Put method

The entire process of the PUT method is as follows:

If value is null, an exception is thrown.
If the table[index] element is not empty, iterate. If the same key is encountered, replace it directly and return the old value.
Otherwise, we can insert it into the table[index] position.

Public synchronized V put(K key, V value) {// Make sure the value is not nullif (value == null) {
            throw new NullPointerException();
        }

        // Makes sure the key is not already in// Make sure the key is not in the hashtable // First, passhash// Next, iterate over the list of index positions. If the list of index positions has the same key, replace value and return the old value Entry TAB [] = table; inthash = hash(key);
        int index = (hash & 0x7FFFFFFF) % tab.length;
        for(Entry<K,V> e = tab[index] ; e ! = null ; e = e.next) {if ((e.hash == hash) && e.key.equals(key)) {
                V old = e.value;
                e.value = value;
                return old;
            }
        }

        modCount++;
        if (count >= threshold) {
            // Rehash the table ifThe threshold is exceeded // If the threshold is exceeded, proceedrehashoperationrehash(a); tab = table;hash = hash(key);
            index = (hash& 0x7FFFFFFF) % tab.length; } // Creates a new entry. // Creates a new entry. // Create a new Entry node, insert the new Entry into the index position of the Hashtable, and set e to the next Entry TAB [index] = new Entry<>(hash, key, value, e);
        count++;
        return null;
    }Copy the code

The get method

The get method is much simpler than the PUT method. The process is to first hash the key using the hash() method, and then to the index based on the hash value (both steps use the same algorithm as the PUT method). Then iterate over the linked list, returning the value of the matched key; If not found, null is returned.

public synchronized V get(Object key) {
        Entry tab[] = table;
        int hash = hash(key);
        int index = (hash & 0x7FFFFFFF) % tab.length;
        for(Entry<K,V> e = tab[index] ; e ! = null ; e = e.next) {if ((e.hash == hash) && e.key.equals(key)) {
                returne.value; }}return null;
    }Copy the code

3.ConcurrentHashMap

In JDK1.7, the data structure of ConcurrentHashMap consists of an array of segments and multiple hashentries

The meaning of Segment array is to divide a large table into multiple small tables for locking, namely the lock separation technology mentioned above. Each Segment element stores a HashEntry array + linked list, which is the same data storage structure as a HashMap

The put operation

For data inserts in ConcurrentHashMap, two Hash attempts are made to locate the storage location

static class Segment<K,V> extends ReentrantLock implements Serializable {

When performing a put operation, the first key hash is performed to locate the Segment. If the Segment has not been initialized, the CAS operation is used to assign values to the Segment. A second hash operation is then performed to find the corresponding HashEntry location. This takes advantage of the inherited lock properties and attempts to obtain the lock by inheriting the tryLock () method of ReentrantLock when inserting data into the specified HashEntry location (the end of the list). If the Segment is successfully acquired, the thread inserts the Segment directly into the corresponding location. If another thread has already acquired the Segment, the current thread spins to the tryLock () method to obtain the lock, suspends the Segment after the specified number of times, and waits to wake up.

Get operation

The get operation of ConcurrentHashMap is similar to that of HashMap, except that the ConcurrentHashMap first hashes to locate the Segment and then hash to locate the specified HashEntry. Iterate over the list under the HashEntry for comparison, returning on success or NULL on failure.

Calculating the size of a ConcurrentHashMap element is an interesting problem, because it is concurrent, that is, while you calculate the size, it is also concurrent to insert data, which may cause the calculated size to differ from the actual size (when you return the size, To solve this problem, JDK1.7 uses two solutions.

In the first scheme, he will try to calculate the size of ConcurrentHashMap several times in unlocked mode, at most three times. If the results of the previous two calculations are consistent, it will be considered that no element is added and the calculation result is accurate.
If the first Segment does not fit, then lock each Segment and calculate the size of the ConcurrentHashMap.

The realization of the JDK1.8

The implementation of JDK1.8 has discarded the concept of Segment and implemented Node arrays + linked lists + red-black tree data structures. Concurrency control is implemented using Synchronized and CAS, and the whole thing looks like an optimized and thread-safe HashMap. Although you can still see the data structure of segments in JDK1.8, the attributes have been simplified to make them compatible with older versions.

The put operation

In the example above, when we add personal information, we call the PUT method. Let’s take a look.

If there is no initialization, the initTable () method is called first
If there is no hash conflict, CAS is inserted directly
If the expansion is still in progress, perform the expansion first
If there is a hash conflict, lock is used to keep the thread safe. There are two ways to do this: one is to iterate to the end of the list, and the other is to follow the red-black tree structure.
Finally, if the number of the linked list is greater than the threshold 8, the structure of the black mangrove should be converted first, and break enters the cycle again
If the addition succeeds, the addCount () method is called to count the size and check whether it needs to be expanded

Get operation

We are now going to go back to the original example, after we add personal information, we want to get the new information, using String name = map.get(” name “) to get the new name information, Get () ConcurrentHashMap get()

public
V get(Object key) {

Node<K,V>[] tab; Node<K,V> e, p;
int
n, eh; K ek;

int
h = spread(key.hashCode());
// Computes the hash twice

if
((tab = table) ! =
null
&& (n = tab.length) >
0
&&

(e = tabAt(tab, (n -
1
) & h)) ! =
null
) {
// Read the Node element of the first Node

if
((eh = e.hash) == h) {
// Returns if the node is the first node

if
((ek = e.key) == key || (ek ! =
null
&& key.equals(ek)))

return
e.val;

}

// A negative hash value indicates expansion. In this case, ForwardingNode's find method is consulted to locate nextTable

// find, find, return

else
if
(eh <
0
)

return
(p = e.find(h, key)) ! =
null
? p.val :
null
;

while
((e = e.next) ! =
null
) {
// It is neither the first node nor ForwardingNode, so go ahead

if
(e.hash == h &&

((ek = e.key) == key || (ek ! =
null
&& key.equals(ek))))

return
e.val;

}

}

return
null
;

}

Compute the hash value, locate the table index position, return if the first node matches
If capacity expansion occurs, the find method of ForwardingNode, which marks the node being expanded, will be called to find the node and return if the node matches
If none of the above is true, the node is traversed and returns if it matches, otherwise null is returned

The ConcurrentHashMap data structure is similar to that of HashMap. ConcurrentHashMap only adds synchronous operations to control concurrency. From JDK1.7’s ReentrantLock+Segment+HashEntry to JDK1.8’s synchronized+CAS+HashEntry+ red-black tree, we can sum up the following:

The JDK1.8 implementation reduces the granularity of locks. The JDK1.7 version locks are segment-based and contain multiple hashentries, whereas the JDK1.8 lock granularity is HashEntry.
JDK1.8’s data structure has become simpler, making the operation more clear and smooth, because synchronized has been used to synchronize, so there is no need for the concept of Segment lock, so there is no need for data structure such as Segment, due to the reduction of granularity, implementation complexity has increased
JDK1.8 uses red-black trees to optimize linked lists. Traversal based on long linked lists is a long process, and red-black trees traversal efficiency is fast, instead of a certain threshold of linked lists, so as to form an optimal partner
In JDK1.8, synchronized is used instead of ReentrantLock.

Because of the reduced granularity, synchronized is not inferior to ReentrantLock in relatively low-granularity locking. In coarse-granularity locking, ReentrantLock may control the boundary of each low-granularity through Condition, which is more flexible. In low-granularity locking, synchronized is not inferior to ReentrantLock. Condition’s advantage is gone
The JVM development team never gave up on synchronized, and JVM based synchronized optimizations have more room for optimization and are more natural to use embedded keywords than apis
Api-based ReentrantLock can consume more memory for the JVM’s memory stress under heavy data operations, which is not a bottleneck, but is a choice

4. To summarize

There are several major differences between hashTables and HashMaps: thread-safety and speed. Use Hashtable only when you need complete thread-safety, and ConcurrentHashMap if you use Java 5 or above.

Does the collection class HashMap, HashTable, ConcurrentHashMap differ?

1.HashMap

2.HashTable

Put method

The get method

3.ConcurrentHashMap

The put operation

Get operation

The realization of the JDK1.8

The put operation

Get operation

4. To summarize

Related Posts

Discover Kafka’s hard drive design and quickly fulfill pB-level data expansion requirements!

Python: Simple single-player game (text)

Rename table, what should I do?