preface
Key values such as maps are classic structures in software development and are often used to store data in memory.
This article focuses on ConcurrentHashMap, a ConcurrentHashMap container. Before we get started, I think we need to talk about HashMap, without which there would be no ConcurrentHashMap.
HashMap
It is well known that the underlying HashMap is based on arrays and linked lists, but the implementation is slightly different in jdk1.7 and 1.8.
The Base 1.7
Data structure diagram in 1.7:
Let’s take a look at the implementation in 1.7.
These are the member variables that compare the core of the HashMap; What do they mean?
- Initialize the bucket size, which is the default size of the array because the underlying is an array.
- Maximum number of buckets.
- Default load factor (0.75)
table
An array that actually holds data.Map
The size of the storage quantity.- Bucket size, which can be specified explicitly at initialization.
- Load factor, which can be specified explicitly at initialization.
Focus on the load factor:
Since the size of a given HashMap is fixed, such as default initialization:
public HashMap() {
this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
threshold = initialCapacity;
init();
}
Copy the code
Given a default capacity of 16 with a load factor of 0.75. When Map is used, data is continuously stored in it. When the amount of data reaches 16 x 0.75 = 12, the current capacity of 16 needs to be expanded. This process involves rehash and data replication, which consumes high performance.
Therefore, it is recommended to estimate the size of the HashMap in advance to minimize performance loss caused by capacity expansion.
As you can see from the code, what’s really storing the data is
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
This array, how is it defined?
Entry is an inner class in a HashMap, as can be easily seen from its member variables:
- Key is the key to write to.
- Value is value.
- From the beginning, we mentioned that HashMap is made up of arrays and linked lists, so this next is used to implement the linked list structure.
- Hash stores the hashcode of the current key.
Knowing the basic structure, let’s take a look at the important write and fetch functions:
Put method
public V put(K key, V value) { if (table == EMPTY_TABLE) { inflateTable(threshold); } if (key == null) return putForNullKey(value); int hash = hash(key); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }Copy the code
- Determines whether the current array needs to be initialized.
- If key is null, put a null value into it.
- Calculate the Hashcode based on the key.
- Locate the bucket based on the calculated HashCode.
- If the bucket is a linked list, we need to traverse to determine whether the hashcode and key in the bucket are equal to the incoming key. If they are equal, we overwrite them and return the original value.
- If the bucket is empty, no data is stored in the current location. Add an Entry object to the current location.
void addEntry(int hash, K key, V value, int bucketIndex) { if ((size >= threshold) && (null ! = table[bucketIndex])) { resize(2 * table.length); hash = (null ! = key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); } void createEntry(int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry<>(hash, key, value, e); size++; }Copy the code
When calling addEntry to write Entry, you need to determine whether to expand the capacity.
Double the size if necessary, and rehash the current key and position it.
CreateEntry passes the bucket with the current location to the new bucket, and if the bucket has a value, it forms a linked list of locations.
The get method
Let’s look at the get function:
public V get(Object key) { if (key == null) return getForNullKey(); Entry<K,V> entry = getEntry(key); return null == entry ? null : entry.getValue(); } final Entry<K,V> getEntry(Object key) { if (size == 0) { return null; } int hash = (key == null) ? 0 : hash(key); for (Entry<K,V> e = table[indexFor(hash, table.length)]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) return e; } return null; }Copy the code
- The HashCode is first calculated based on the key, and then located in the specific bucket.
- Determines whether the location is a linked list.
- It’s not a linked list
Key and hashcode of key
Is equal to return the value. - For a linked list, you iterate until the key and hashcode are equal and return the value.
- Return null without retrieving anything.
The Base 1.8
Do you see any points that need to be optimized?
In fact, one obvious place is:
When Hash conflicts are serious, the linked list formed on the bucket becomes longer and longer, resulting in lower query efficiency. Time complexity is O(N).
So 1.8 focuses on optimizing this query efficiency.
1.8 HashMap Structure:
Let’s start with a few core member variables:
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.6f; static final int TREEIFY_THRESHOLD = 8; transient Node<K,V>[] table; /** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */ transient Set<Map.Entry<K,V>> entrySet; /** * The number of key-value mappings contained in this map. */ transient int size;Copy the code
It’s basically the same as 1.7, with a few important differences:
TREEIFY_THRESHOLD
A threshold used to determine whether a linked list needs to be converted to a red-black tree.- Change HashEntry to Node.
In fact, the core composition of Node is the same as HashEntry in 1.7, which stores key value, Hashcode next and other data.
Now look at the core method.
Put method
It seems to be more complicated than 1.7, so we break it down step by step:
- Check whether the current bucket is empty. If the bucket is empty, initialize it (resize will determine whether to initialize it).
- Locate the bucket according to the hashcode of the current key and check whether it is empty. If it is empty, it indicates that there is no Hash conflict. Create a new bucket at the current location.
- If the current bucket has a value (Hash conflict), compare the values in the current bucket
Key and hashcode of key
Is equal to the written key, which is assigned toe
At step 8, the assignment and return are unified. - If the current bucket is a red-black tree, write data in red-black tree fashion.
- If it is a linked list, encapsulate the current key and value into a new node and write it to the end of the bucket (form a linked list).
- Then determine whether the size of the current list is greater than the preset threshold, and if so, convert to a red-black tree.
- If the same key is found during the traversal, exit the traversal directly.
- if
e ! = null
It’s equivalent to having the same key, so you need to override the value. - Determine whether capacity expansion is required.
The get method
public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key ! = null && key.equals(k)))) return first; if ((e = first.next) ! = null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code
The GET method looks a lot simpler.
- The key is first hash and the bucket is retrieved.
- If the bucket is empty, null is returned.
- Otherwise, it checks whether the key in the first position of the bucket (such as a linked list or red-black tree) is the query key and returns value directly.
- If the first one does not match, it is determined whether the next one is a red-black tree or a linked list.
- The red-black tree returns the value as the tree looks up.
- Otherwise, the match returns are iterated through in a linked list.
From these two core methods (get/ PUT), we can see that the large linked list is optimized in 1.8, and the query efficiency is directly improved to O(logn) after the modification to red-black tree.
However, the original problems of HashMap also exist, such as the tendency for an infinite loop when used in concurrent scenarios.
final HashMap<String, String> map = new HashMap<String, String>();
for (int i = 0; i < 1000; i++) {
new Thread(new Runnable() {
@Override
public void run() {
map.put(UUID.randomUUID().toString(), "");
}
}).start();
}
Copy the code
But why? Just a quick analysis.
The resize() method is called when the HashMap is expanded, because the concurrent operation is easy to form a circular list on a bucket; So when you get a key that doesn’t exist, and the index is the index of the linked list, there will be an infinite loop.
The diagram below:
Traverse the way
It is also worth noting that HashMap traversal is usually done in one of the following ways:
Iterator<Map.Entry<String, Integer>> entryIterator = map.entrySet().iterator();
while (entryIterator.hasNext()) {
Map.Entry<String, Integer> next = entryIterator.next();
System.out.println("key=" + next.getKey() + " value=" + next.getValue());
}
Iterator<String> iterator = map.keySet().iterator();
while (iterator.hasNext()){
String key = iterator.next();
System.out.println("key=" + key + " value=" + map.get(key));
}
Copy the code
The first type of EntrySet is strongly recommended for traversal.
The first method can fetch the key value at the same time, while the second method needs to fetch the value through the key once, which is inefficient.
A quick summary of HashMap: either 1.7 or 1.8 shows that the JDK does nothing to synchronize it, causing concurrency problems and even an infinite loop that makes the system unusable.
Hence the JDK’s specialized ConcurrentHashMap, under the java.util.concurrent package, dedicated to concurrency issues.
Those of you who insist on seeing this have already laid the foundation for ConcurrentHashMap.
ConcurrentHashMap
ConcurrentHashMap is also available in versions 1.7 and 1.8, with slightly different implementations.
The Base 1.7
Let’s take a look at the 1.7 implementation. Here’s how it looks:
As shown in the figure, it consists of an array of segments and a HashEntry list.
Its core member variables:
/** * Segment array; */ final Segment<K,V>[] segments; transient Set<K> keySet; transient Set<Map.Entry<K,V>> entrySet;Copy the code
The Segment is an internal class of ConcurrentHashMap.
static final class Segment<K,V> extends ReentrantLock implements Serializable { private static final long serialVersionUID = 2249069246763182397L; Transient volatile HashEntry<K,V>[] table; transient volatile HashEntry<K,V>[] table; transient int count; transient int modCount; transient int threshold; final float loadFactor; }Copy the code
Take a look at the composition of HashEntry:
Much like a HashMap, the only difference is that the core data, such as value, and the linked list are volatile, ensuring visibility upon fetching.
In principle, ConcurrentHashMap uses Segment locking technology, where Segment inherits from ReentrantLock. ConcurrentHashMap supports concurrent CurrencyLevel (number of segments). Every time a thread holds a lock on one Segment, it does not affect the other segments.
Let’s also look at the core put Get method.
Put method
public V put(K key, V value) {
Segment<K,V> s;
if (value == null)
throw new NullPointerException();
int hash = hash(key);
int j = (hash >>> segmentShift) & segmentMask;
if ((s = (Segment<K,V>)UNSAFE.getObject // nonvolatile; recheck
(segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment
s = ensureSegment(j);
return s.put(key, hash, value, false);
}
Copy the code
First, locate the Segment by key, and then perform specific PUT in the corresponding Segment.
final V put(K key, int hash, V value, boolean onlyIfAbsent) { HashEntry<K,V> node = tryLock() ? null : scanAndLockForPut(key, hash, value); V oldValue; try { HashEntry<K,V>[] tab = table; int index = (tab.length - 1) & hash; HashEntry<K,V> first = entryAt(tab, index); for (HashEntry<K,V> e = first;;) { if (e ! = null) { K k; if ((k = e.key) == key || (e.hash == hash && key.equals(k))) { oldValue = e.value; if (! onlyIfAbsent) { e.value = value; ++modCount; } break; } e = e.next; } else { if (node ! = null) node.setNext(first); else node = new HashEntry<K,V>(hash, key, value, first); int c = count + 1; if (c > threshold && tab.length < MAXIMUM_CAPACITY) rehash(node); else setEntryAt(tab, index, node); ++modCount; count = c; oldValue = null; break; } } } finally { unlock(); } return oldValue; }Copy the code
The value in HashEntry is volatile, but atomicity of concurrency is not guaranteed, so the put operation still requires locking.
The first step is to try to acquire the lock. If the lock fails and there is no doubt that other threads are competing, use the spin of scanAndLockForPut() to acquire the lock.
- Try to spin the lock.
- If the number of retries is reached
MAX_SCAN_RETRIES
Block lock acquisition to ensure success.
Combine this with the diagram to see the flow of PUT.
- Locate the table in the current Segment to a HashEntry using the hashcode of the key.
- The HashEntry is iterated over, and if it is not empty it determines whether the passed key is equal to the currently iterated key, and if it is, the old value is overwritten.
- If it is not empty, create a HashEntry and add it to the Segment. At the same time, it determines whether to expand the Segment.
- The lock on the current Segment obtained in 1 is finally unlocked.
The get method
public V get(Object key) { Segment<K,V> s; // manually integrate access methods to reduce overhead HashEntry<K,V>[] tab; int h = hash(key); long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE; if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) ! = null && (tab = s.table) ! = null) { for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE); e ! = null; e = e.next) { K k; if ((k = e.key) == key || (e.hash == h && key.equals(k))) return e.value; } } return null; }Copy the code
Get logic is simple:
Just Hash the Key to the Segment and Hash it to the element.
Because the value attribute in HashEntry is decorated with a volatile keyword to ensure memory visibility, it is retrieved with the latest value each time.
The Get method of ConcurrentHashMap is very efficient because the entire process does not require locking.
The Base 1.8
1.7 has solved the concurrency problem and can support N Segment concurrency, but there are still problems with HashMap in 1.7.
Query traversal is inefficient.
So 1.8 has made some changes to the data structure.
First, let’s take a look at the underlying structure:
Does this look similar to the 1.8 HashMap structure?
The original Segment Segment lock is abandoned, and CAS + synchronized is used to ensure concurrency security.
Also change the HashEntry that holds the data in 1.7 to Node, but it does the same thing.
Val Next is volatile to ensure visibility.
Put method
Let’s focus on the put function:
- Calculate the Hashcode based on the key.
- Check whether initialization is required.
f
That is, the Node located for the current key. If it is empty, data can be written to the current position. CAS is used to attempt to write data.- If the current position of
hashcode == MOVED == -1
, you need to expand the capacity. - If none is met, data is written using a synchronized lock.
- If the quantity is greater than
TREEIFY_THRESHOLD
You convert to a red-black tree.
The get method
- Based on the calculated Hashcode addressing, the value is returned directly if it is on the bucket.
- If it’s a red-black tree you get the value as a tree.
- If you don’t, you just iterate through the list to get the value.
1.8 has made major changes in the data structure of 1.7, adopting red-black tree to ensure query efficiency (O(logn)), and even cancelled ReentrantLock and changed to synchronized. You can see that synchronized is well optimized in newer versions of the JDK.
conclusion
After looking at the different implementations of HashMap and ConcurrentHashMap in 1.7 and 1.8, you should have a better understanding of them.
In fact, this is also the focus of the interview, the usual routine is:
- Talk about what you understand about hashmaps, and talk about the get put process.
- What optimizations have been made in 1.8?
- Is it thread safe?
- What are the problems caused by insecurity?
- How to solve it? Is there a thread-safe concurrency container?
- How is ConcurrentHashMap implemented? 1.7. What are the differences in 1.8 implementations? Why do you do that?
This is a list of questions that I believe you will be able to write back to the interviewer after reading it carefully.
In addition to interview questions, there are many common applications. For example, the implementation of the Cache in Guava is based on the idea of ConcurrentHashMap.
You can also learn about optimization ideas and concurrency solutions from JDK authors.
In fact, the premise of writing this article is derived from a Issues on GitHub, and I hope you can participate in the joint maintenance of this project.
extra
Recently in the summary of some Java related knowledge points, interested friends can maintain together.
Address: github.com/crossoverJi…
Welcome to pay attention to the public account to communicate: