preface

Key values such as maps are a classic structure in software development and are often used to store data in memory.

This article is mainly about ConcurrentHashMap, a concurrent container. Before we get started, I think it’s worth talking about HashMap, without which there would be no ConcurrentHashMap.

HashMap

The underlying HashMap is known to be an array + linked list, but the implementation is slightly different in JDK1.7 and 1.8.

The Base 1.7

Data structure diagram in 1.7:

Let’s start with the implementation in 1.7.

These are the member variables at the heart of a HashMap comparison; What does it mean to be separate?

  1. Initializes the size of the bucket. This is the default size of the array because the underlying is an array.
  2. Maximum bucket value.
  3. Default load factor (0.75)
  4. tableThe actual array that holds the data.
  5. MapThe size of the storage quantity.
  6. Bucket size, which can be specified explicitly during initialization.
  7. Load factor, which can be specified explicitly at initialization time.

Emphasis on load factors:

Since the size of a given HashMap is fixed, such as the default initialization:

    public HashMap(a) {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init();
    }
Copy the code

The default capacity given is 16 and the load factor is 0.75. When the Map is in use, data is constantly stored in it. When the number reaches 16 * 0.75 = 12, the current capacity of 16 needs to be expanded. This expansion involves rehash and data replication, which consumes performance.

Therefore, it is generally recommended to estimate the size of a HashMap in advance to minimize the performance cost of expansion.

You can see from the code that the actual data is stored

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

This array, how is it defined?

Entry is an inner class in a HashMap. It is easy to see from its member variables:

  • The key is the key that was written.
  • Value, of course, is value.
  • At the beginning, it was mentioned that a HashMap consists of an array and a linked list, so this next is for implementing the linked list structure.
  • The hash stores the hashcode of the current key.

Now that you know the basic structure, let’s look at the important write and fetch functions:

Put method

    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
Copy the code
  • Determines whether the current array needs to be initialized.
  • If the key is empty, put a null value in it.
  • Calculate the Hashcode based on the key.
  • Locate the bucket based on the calculated Hashcode.
  • If the bucket is a linked list, you need to check whether the hashcode and key in the bucket are equal to the passed key. If they are equal, you will override them and return the original value.
  • If the bucket is empty, no data is stored in the current location. Adds an Entry object to write to the current location.
    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null! = table[bucketIndex])) { resize(2 * table.length);
            hash = (null! = key) ? hash(key) :0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }
    
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }
Copy the code

When you call addEntry to write an Entry, you need to determine whether to expand the capacity.

Double the size if needed, and rehash and reposition the current key.

In createEntry, the current bucket is passed into the new bucket. If the current bucket has a value, the list is formed at the location.

The get method

Let’s look at the get function:

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }
    
    final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        int hash = (key == null)?0 : hash(key);
        for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
             e = e.next) {
            Object k;
            if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                return e;
        }
        return null;
    }
Copy the code
  • First, calculate the hashcode based on the key, and then locate the specific bucket.
  • Determines whether the location is a linked list.
  • It’s not a linked listHashcode of key and keyIs equal or not to return the value.
  • For a linked list, you need to iterate until the key and hashcode are equal and then return a value.
  • You get nothing, you just return null.

The Base 1.8

Do you see any points that need to be optimized in the implementation of 1.7?

In fact, one obvious place is:

When the Hash conflict is serious, the linked list formed on the bucket becomes longer and longer. In this way, the query efficiency becomes lower and lower. The time complexity is O(N).

Therefore, 1.8 focuses on optimizing this query efficiency.

1.8 HashMap structure diagram:

Let’s start with a few core member variables:

    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two < = 1 < < 30. * /
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /** * The load factor used when none specified in constructor. */
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;

    static final int TREEIFY_THRESHOLD = 8;
    
    transient Node<K,V>[] table;

    /** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */
    transient Set<Map.Entry<K,V>> entrySet;

    /** * The number of key-value mappings contained in this map. */
    transient int size;
Copy the code

It’s pretty much the same as 1.7, but there are a few important differences:

  • TREEIFY_THRESHOLDThreshold used to determine whether a linked list needs to be converted to a red-black tree.
  • Change HashEntry to Node.

In fact, the core components of Node are the same as HashEntry in 1.7, which stores data such as key value hashcode next.

Let’s look at the core method.

Put method

It seems to be more complicated than 1.7, but let’s break it down step by step:

  1. To determine whether the current bucket is empty, if the bucket is empty, it needs to be initialized (resize will determine whether to initialize).
  2. According to the hashcode of the current key, locate the specific bucket and determine whether it is empty. If it is empty, it indicates that there is no Hash conflict, and directly create a new bucket at the current position.
  3. If the current bucket has a value (Hash conflict), compare the values in the current bucketHashcode of key and keyIs equal to the key that was written, which is assigned toeIn step 8, the value will be assigned and returned uniformly.
  4. If the current bucket is a red-black tree, data is written as a red-black tree.
  5. If it is a linked list, you need to encapsulate the current key and value into a new node and write it to the end of the current bucket (forming a linked list).
  6. Then determine whether the size of the current list is greater than the preset threshold, if it is greater than the red black tree.
  7. If the same key is found during the traversal, exit the traversal directly.
  8. ife ! = nullIt’s equivalent to having the same key, so you need to override the value.
  9. Finally, determine whether to expand the capacity.

The get method


    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
            if((e = first.next) ! =null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null); }}return null;
    }
Copy the code

The get method looks a lot easier.

  • First hash the key and get the located bucket.
  • Return null if the bucket is empty.
  • Otherwise, determine whether the key in the first position of the bucket (possibly a linked list or a red-black tree) is the key of the query. If yes, return value directly.
  • If the first one does not match, it determines whether its next is a red-black tree or a linked list.
  • The red-black tree returns the value as the tree finds it.
  • Otherwise, the list is traversed to match the returned value.

From these two core methods (GET/PUT), we can see that the large linked list has been optimized in 1.8, and the query efficiency has been directly improved to O(logn) after the modification to red-black tree.

But the original problems with HashMap also exist, such as infinite loops when used in concurrent scenarios.

final HashMap<String, String> map = new HashMap<String, String>();
for (int i = 0; i < 1000; i++) {
    new Thread(new Runnable() {
        @Override
        public void run(a) {
            map.put(UUID.randomUUID().toString(), "");
        }
    }).start();
}
Copy the code

But why? Just a quick analysis.

The resize() method is called when the HashMap is expanded, because it is easy to create a circular list on a bucket. In this case, when obtaining a non-existent key, the index calculated happens to be the index of the circular list.

The diagram below:

Traverse the way

Another thing to note is the way a HashMap can be traversed, usually in the following ways:

Iterator<Map.Entry<String, Integer>> entryIterator = map.entrySet().iterator();
        while (entryIterator.hasNext()) {
            Map.Entry<String, Integer> next = entryIterator.next();
            System.out.println("key=" + next.getKey() + " value=" + next.getValue());
        }
        
Iterator<String> iterator = map.keySet().iterator();
        while (iterator.hasNext()){
            String key = iterator.next();
            System.out.println("key=" + key + " value=" + map.get(key));

        }
Copy the code

Using the first EntrySet for traversal is strongly recommended.

The first method can fetch the key value at the same time, and the second method needs to fetch the value through the key once, which is inefficient.

A quick summary of HashMap: Both 1.7 and 1.8 can see that the JDK doesn’t do any synchronization on it, so concurrency is a problem, and even infinite loops make the system unusable.

Hence the JDK’s dedicated ConcurrentHashMap, which is located under the java.util.concurrent package and is designed to address concurrency issues.

Insist to see here friends have been ConcurrentHashMap foundation has been firmly, the following formal analysis began.

ConcurrentHashMap

ConcurrentHashMap is also available in versions 1.7 and 1.8, with slightly different implementations.

The Base 1.7

Let’s take a look at the implementation of 1.7. Here’s how it looks:

As shown in the figure, it is composed of a Segment array and a HashEntry. Like a HashMap, it is still an array plus a linked list.

Its core member variable:

    /** * select * from 'Segment' where data is stored. * /
    final Segment<K,V>[] segments;

    transient Set<K> keySet;
    transient Set<Map.Entry<K,V>> entrySet;
Copy the code

Segment is an inner class of ConcurrentHashMap.

    static final class Segment<K.V> extends ReentrantLock implements Serializable {

        private static final long serialVersionUID = 2249069246763182397L;
        
        // The same as the HashEntry in the HashMap
        transient volatile HashEntry<K,V>[] table;

        transient int count;

        transient int modCount;

        transient int threshold;

        final float loadFactor;
        
	}
Copy the code

Consider the composition of HashEntry:

Similar to HashMap, the only difference is that the core data, such as value, and the linked list, are volatile, ensuring visibility when fetched.

In principle: ConcurrentHashMap uses Segment locking technology, where Segment inherits from ReentrantLock. Instead of synchronizing both PUT and GET operations like HashTable, ConcurrentHashMap theoretically supports CurrencyLevel concurrency. Every time a thread accesses a Segment using a lock, other segments are not affected.

Let’s also look at the core put get method.

Put method

    public V put(K key, V value) {
        Segment<K,V> s;
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key);
        int j = (hash >>> segmentShift) & segmentMask;
        if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
             (segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment
            s = ensureSegment(j);
        return s.put(key, hash, value, false);
    }
Copy the code

First, locate the Segment by key, and then put it in the corresponding Segment.

        final V put(K key, int hash, V value, boolean onlyIfAbsent) {
            HashEntry<K,V> node = tryLock() ? null :
                scanAndLockForPut(key, hash, value);
            V oldValue;
            try {
                HashEntry<K,V>[] tab = table;
                int index = (tab.length - 1) & hash;
                HashEntry<K,V> first = entryAt(tab, index);
                for (HashEntry<K,V> e = first;;) {
                    if(e ! =null) {
                        K k;
                        if ((k = e.key) == key ||
                            (e.hash == hash && key.equals(k))) {
                            oldValue = e.value;
                            if(! onlyIfAbsent) { e.value = value; ++modCount; }break;
                        }
                        e = e.next;
                    }
                    else {
                        if(node ! =null)
                            node.setNext(first);
                        else
                            node = new HashEntry<K,V>(hash, key, value, first);
                        int c = count + 1;
                        if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                            rehash(node);
                        else
                            setEntryAt(tab, index, node);
                        ++modCount;
                        count = c;
                        oldValue = null;
                        break; }}}finally {
                unlock();
            }
            return oldValue;
        }
Copy the code

Although the value in HashEntry is modified with volatile keywords, concurrency atomicity is not guaranteed, so locking is still required for put operations.

The first step is to try to acquire the lock. If this fails, there must be a race between other threads, then use scanAndLockForPut() to spin the lock.

  1. Try to spin the lock.
  2. If the number of retries reachedMAX_SCAN_RETRIESInstead, block lock acquisition to ensure successful acquisition.

Take a look at the put process in conjunction with the diagram.

  1. Locate the table in the current Segment to a HashEntry using the key’s hashcode.
  2. The HashEntry is traversed. If it is not empty, the passed key and the currently traversed key are determined to be equal, and if so, the old value is overridden.
  3. If it is not empty, create a HashEntry and add it to the Segment. At the same time, the system determines whether to expand the HashEntry.
  4. Finally, the lock on the current Segment acquired in 1 is unlocked.

The get method

    public V get(Object key) {
        Segment<K,V> s; // manually integrate access methods to reduce overhead
        HashEntry<K,V>[] tab;
        int h = hash(key);
        long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
        if((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) ! =null&& (tab = s.table) ! =null) {
            for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
                     (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE); e ! =null; e = e.next) {
                K k;
                if ((k = e.key) == key || (e.hash == h && key.equals(k)))
                    returne.value; }}return null;
    }
Copy the code

The get logic is simple:

All you need to do is Hash the Key to the specific Segment, and then Hash it to the specific element.

Since the value attribute in HashEntry is decorated with volatile keywords, memory visibility is guaranteed, so the latest value is fetched each time.

The get method of ConcurrentHashMap is very efficient because the entire process does not require locking.

The Base 1.8

The concurrency issue has been resolved in 1.7 and can support concurrency of N segments as many times, but there is still a problem with HashMap in 1.7.

That is, the query traversal list is too inefficient.

As a result, some data structure changes have been made in 1.8.

First, let’s look at the underlying structure:

Does it look similar to the 1.8 HashMap structure?

The original Segment lock is abandoned and CAS + synchronized is adopted to ensure concurrency security.

Also changed the 1.7 HashEntry to Node, but the effect is the same.

Val next is volatile to ensure visibility.

Put method

Let’s focus on the put function:

  • Calculate the Hashcode based on the key.
  • Determine whether initialization is required.
  • fThat is, the Node located by the current key. If the Node is empty, data can be written to the current location. The CAS is used to try to write data.
  • If the current locationhashcode == MOVED == -1, the capacity needs to be expanded.
  • If none of these are met, the synchronized lock is used to write data.
  • If the quantity is greater thanTREEIFY_THRESHOLDIt’s going to be a red-black tree.

The get method

  • Based on the calculated hashcode addressing, return the value directly if it is on the bucket.
  • If it’s a red-black tree then you get the value as a tree.
  • If it doesn’t then we’re going to go through the list and get the value.

1.8 has made major changes in the data structure of 1.7. After adopting red-black tree, the query efficiency (O(logn)) can be guaranteed, and even synchronized instead of ReentrantLock. This shows that the synchronized optimization in the new JDK is in place.

conclusion

After looking at the different implementations of HashMap and ConcurrentHashMap in 1.7 and 1.8, you should have a better understanding of them.

In fact, this is also an important part of the interview. The usual routine is:

  1. Talk about what you think of a HashMap, and talk about the get and put process.
  2. 1.8 What optimizations have been made?
  3. Is it thread safe?
  4. What are the problems caused by insecure plenary session?
  5. How to solve it? Is there a thread-safe concurrency container?
  6. How is ConcurrentHashMap implemented? What are the differences between 1.7 and 1.8 implementations? Why do you do that?

This is a list of questions I believe you will be able to read carefully after the interviewer.

In addition to the interview will be asked about the usual application in fact also quite a lot, as mentioned before in Guava Cache implementation is the use of ConcurrentHashMap idea.

At the same time, you can learn the optimization ideas and concurrency solutions of JDK authors.

In fact, I am writing this post because of an issue on GitHub. I hope everyone can participate in this issue and jointly maintain this project.

extra

Recently in the summary of some Java related knowledge points, interested friends can maintain together.

Address: github.com/crossoverJi…

Welcome to follow the public account to communicate: