Language finch interview information sharing, welcome to come to contribute.
The underlying HashMap is made up of arrays and linked lists, using the chained address method, but the implementation is slightly different in jdk1.7 and 1.8.
JDK 1.7
Data structure diagram in 1.7: Let’s take a look at the implementation in 1.7.These are the member variables that compare the core of the HashMap; What do they mean?
- Initialize the bucket size to 16, which is the default size of the array since the underlying is an array.
- The maximum value of a bucket is 1<<30.
- Default load factor (0.75)
- Table is the array that actually holds data.
- Size of the Map storage quantity.
- Bucket size, which can be specified explicitly at initialization.
- Load factor, which can be specified explicitly at initialization.
The initialization value
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)(capacity * loadFactor);
table = new Entry[capacity];
init();
}
Copy the code
The initialization value initialCapacity passed in is automatically converted to the nearest value & greater than 2 to the power of N. JDK1.7 uses a loop, while JDk1. uses the tableSizeFor method instead of a loop, which is also clever.
The load factor
Given a default capacity of 16 with a load factor of 0.75. When Map is used, data is continuously stored in the Map. When the amount of data reaches 16 x 0.75 = 12, the current capacity of 16 needs to be expanded to double the current capacity. Capacity expansion involves operations such as rehash and data replication, which consume high performance. Therefore, it is recommended to estimate the size of HashMap in advance to minimize performance loss caused by capacity expansion.
As you can see from the code, what’s really storing the data istransient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
This array, how is it defined?Entry is an inner class in a HashMap, as can be easily seen from its member variables:
- Key is the key to write to.
- Value is value.
- From the beginning, we mentioned that HashMap is made up of arrays and linked lists, so this next is used to implement the linked list structure.
- Hash stores the hashcode of the current key.
Knowing the basic structure, let’s take a look at the important write and fetch functions.
Put method
The code comes from github.com/openjdk/jdk…
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
/** * Returns index for hash code h. */
static int indexFor(int h, int length) {
return h & (length-1);
}
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
void createEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
size++;
}
Copy the code
- Determines whether the current array needs to be initialized.
- If key is null, put a null value into it.
- Calculate the Hashcode based on the key
(h = key.hashCode()) ^ (h >>> 16)
。 - Locate the bucket based on the calculated HashCode.
- If the bucket is a linked list, we need to traverse to determine whether the hashcode and key in the bucket are equal to the incoming key. If they are equal, we overwrite them and return the original value.
- If the bucket is empty, no data is stored in the current location. Add an Entry object to the current location.
- When calling addEntry to write Entry, you need to determine whether to expand the capacity. Double the size if necessary, and rehash the current key and position it. And in the
createEntry
Will pass the current position of the bucket to the new bucket, if the current bucket has a value will form a linked list in the position.
The get method
The code comes from github.com/openjdk/jdk…
public V get(Object key) {
if (key == null)
return getForNullKey();
Entry<K,V> entry = getEntry(key);
return null == entry ? null : entry.getValue();
}
final Entry<K,V> getEntry(Object key) {
if (size == 0) {
return null;
}
int hash = (key == null)?0 : hash(key);
for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
e = e.next) {
Object k;
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
}
return null;
}
Copy the code
- First of all, according to
key
To calculate thehashcode
And then locate it in the specific bucket. - Determines whether the location is a linked list.
- It’s not a linked list
key
的hashcode
Is equal to return the value. - For linked lists, you need to iterate until
key
及hashcode
It returns the value when it is equal. - I get nothing and I go straight back
null
。
Should hashCode and equals be overridden together?
The hashCode() method of the key is calculated to determine which index of the array the key resides in. Then the equals() method is called to determine whether it is the same key and whether to replace the value. So they have to be overridden together to make sure that the array subscripts in the HashMap are the same; if they are different, equals() is never called at all.
JDK 1.8
Jdk1.8’s implementation is largely the same as 1.7. But the drawbacks of JDK1.7 are:
When Hash conflicts are serious, the linked list formed on the bucket becomes longer and longer, resulting in lower query efficiency. Time complexity is O(N).
So the implementation of 1.8 is: when the list length exceeds8, will becomeRed and black tree. Structure:Let’s start with a few core member variables:
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two < = 1 < < 30. * /
static final int MAXIMUM_CAPACITY = 1 << 30;
/** * The load factor used when none specified in constructor. */
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
static final int TREEIFY_THRESHOLD = 8;
transient Node<K,V>[] table;
/** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */
transient Set<Map.Entry<K,V>> entrySet;
/** * The number of key-value mappings contained in this map. */
transient int size;
Copy the code
It’s basically the same as 1.7, with a few important differences:
TREEIFY_THRESHOLD=8
A threshold used to determine whether a linked list needs to be converted to a red-black tree.- Change HashEntry to Node.
In fact, the core composition of Node is the same as HashEntry in 1.7, which stores key value, Hashcode next and other data. Now look at the core method.
Put method
The source codeGithub.com/openjdk/jdk… It seems to be more complicated than 1.7, so we break it down step by step:
- Check whether the current bucket is empty. If the bucket is empty, initialize it (resize will determine whether to initialize it).
- Locate the bucket according to the hashcode of the current key and check whether it is empty. If it is empty, it indicates that there is no Hash conflict. Create a new bucket at the current location.
- If the current bucket has a value (Hash conflict), compare the values in the current bucket
key
的hashcode
With the writtenkey
If it is equal, it is assigned toe
At step 8, the assignment and return are unified. - If the current bucket is a red-black tree, write data in red-black tree fashion.
- If it is a linked list, encapsulate the current key and value into a new node and write it to the end of the bucket (form a linked list).
- Then determine whether the size of the current list is greater than the preset threshold, and if so, convert to a red-black tree.
- If the same key is found during the traversal, exit the traversal directly.
- if
e ! = null
It’s equivalent to having the same key, so you need to override the value. - Determine whether capacity expansion is required.
The get method
Source: github.com/openjdk/jdk…
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
return first;
if((e = first.next) ! =null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
} while((e = e.next) ! =null); }}return null;
}
Copy the code
The GET method looks a lot simpler.
- The key is first hash and the bucket is retrieved.
- If the bucket is empty, null is returned.
- Otherwise, it checks whether the key in the first position of the bucket (such as a linked list or red-black tree) is the query key and returns value directly.
- If the first one does not match, it is determined whether the next one is a red-black tree or a linked list.
- The red-black tree returns the value as the tree looks up.
- Otherwise, the match returns are iterated through in a linked list.
From these two core methods (get/ PUT), we can see that the large linked list is optimized in 1.8, and the query efficiency is directly improved to O(logn) after the modification to red-black tree.
Concurrency is unsafe
The original problems with HashMap also exist.
public static void main(String[] args) throws InterruptedException {
int N = 100000;
final HashMap<String, String> map = new HashMap<String, String>();
//final HashMap
map = new HashMap
(N); // Specify the capacity to avoid expansion
,>
,>
Thread[] ts = new Thread[N];
for (int i = 0; i < N; i++) {
ts[i] = new Thread(new Runnable() {
@Override
public void run(a) {
map.put(UUID.randomUUID().toString(), ""); }}); }for (int i = 0; i < ts.length; i++) {
ts[i].start();
}
for (Thread t : ts) {
t.join();
}
System.out.println("end");
}
Copy the code
If the above code is executed several times, an infinite loop will occur. If theJDK1.8And above,
- An endless loop can occur
- It could happen
java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
Most bloggers only mention the endless loop, not the type strong anomaly
How does the loop happen
It was caused by the expansion.
Source code analysis
The process of the PUT method is shown above
- Check whether the key already exists.
key
There is, replace. Does not exist, insert new element - After inserting the new element, check
(size++ >= threshold)
It’s going to expand, it’s going to double the capacity. - Expansion call
resize()
Methods. Here, a larger array will be created, and elements will be moved through the transfer method. The logic of movement is also very clear. The linked list of each position in the original table will be traversed, and each element will be re-hash, and the destination will be found in the new newTable and inserted.
Resize () method source code
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for(Entry<K,V> e = table[i]; e ! =null; e = e.next) { // This is a loop, and an infinite loop can occur.
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
/** * Returns index for hash code h. */
static int indexFor(int h, int length) {
return h & (length-1);
}
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable);
table = newTable;
threshold = (int)(newCapacity * loadFactor);
}
/** * Transfers all entries from current table to newTable. */
void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
if(e ! =null) {
src[j] = null;
do {
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while(e ! =null); }}Copy the code
Case analysis
Let’s say the HashMap has an initial size of 4, insert three nodes, and unfortunately, all three nodes hash to the same location. If the default load factor is used, insert a third node and expand it. (For verification purposes, let’s say the load factor is1). Let’s say I have two threads,Thread 1andThread 2While inserting a third node,resize()
The method is executed simultaneously, and both threads create a new array.Assuming thatThread 2In carrying out theEntry<K,V> next = e.next;
After that, the CPU time slice runs oute
Point to the nodea
, the variablenext
Point to the nodeb
。
Thread 1Go ahead. Unfortunately,a
、 b
、 c
noderehash
(that is, the callindexFor
Method) and then again in the same position7
First step, move node AStep 2: Move node BStep 3: Move node CAt this timeThread 1The internal table has not been set as a new newTable.Thread 2The internal reference relationship is as follows:
At this time, inThread 2The variablee
Point to the nodea
, the variablenext
Point to the nodeb
To start executing the remaining logic of the loop body.After execution, the variable e points to node B, since e is not null, the loop body continues, and the reference relationship is executedvariablee
Point back to the node againa
, can only continue to execute the loop body, here is a careful analysis: 1Entry<K,V> next = e.next;
, current nodea
There is nonext
, so the variablenext
Point to thenull
; 2,e.next = newTable[i];
Among themnewTable[i]
Point to the nodeb
That is thea
的 next
It points to the nodeb
So that thea
和 b
They refer to each other and form a loop; 3,newTable[i] = e
Put node A in array I; 4,e = next;
The variablee
The assignment fornull
Because the variables in step 1next
Is the point tonull
; So the final reference relationship looks like this:
Nodes A and B refer to each other to form a loop, and an infinite loop will occur whenever the list of rings below the array is traversed.
In JDK1.7, you can set the initial capacity to avoid expansion and avoid an infinite loop. However, JDK1.8 avoids the occurrence of an infinite loop, but it does not prevent the following class type strong exception from occurring
How do strong type exceptions occur
JDK1.8 source code:
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
Two threads 1 and 2 are inserted simultaneously.
- Thread 1 inserts element 2 and line 14 is suspended by CPU
- Thread 2 has inserted itself into the eighth element, turning the list into a red-black tree.
- Thread 1 is awakened. Type coercion failed.
Reference:
Crossoverjie. Top / 2018/07/23 /… (archive)
Juejin. Cn/post / 684490… (archive)