[TOC]
Add method in HashSet source code details
Introduction of HashSet
AbstractSet is an implementation class of Java container framework. AbstractSet is an implementation class of Java container framework, which is derived from AbstractSet. HashSet is used to store single-value non-repeating containers. But there can only be one NULL key value. Let’s start by creating an empty HashSet and then add elements to it, walking through the source code step by step to see how a HashSet is implemented.
Source code analysis
Before source code analysis, we first write a piece of code, and then according to this code step by step in-depth source code, analysis. The code is simple as follows:
Set<String> set = new HashSet<>();
set.add("110");
set.add("110");
set.add(new String("110"));
System.out.println(set.size())
Copy the code
1. Construction method
Set
Set = new HashSet(); In a literal sense, there are the following:
- This code creates a new HashSet object by calling the HashSet parameterless constructor
- Assigns a reference to an object to the interface Set that it implements
- Passing a generic String to the elements of a Set will allow the Set to hold only String data at compile time. (Strings are final and cannot be inherited.)
HashSet parameterless constructor
HashSet constructor:
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
public HashSet(a) {
map = new HashMap<>();
}
Copy the code
The first line of the code declares a HashMap member variable, map, of type
. The underlying implementation of a HashSet is implemented using a HashMap. A HashSet takes advantage of the fact that the key value in a HashMap does not repeat, and is used to store its own elements. The value in a Map is the static constant PRESENT on the second line. Is an Object Object. We can verify this later. As you can see from the above code, when the no-parameter constructor of a HashSet is called, it simply calls the no-parameter constructor of a HashMap to initialize the member variable map.
HashMap takes no arguments constructor
public HashMap(a) {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
Copy the code
As you can see, the no-parameter constructor of the HashMap is also very simple. It simply sets the loadFactor of the member variable loadFactor, which is a static constant of 0.75F and does not set its capacity. Then let’s look at how the member variables used to store the data in the HashMap are declared:
/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) */
transient Node<K,V>[] table;
Copy the code
As noted in the documentation, the table is initialized the first time it is used and must be resized the first time it is used. The length of the table must be a power of two. Let’s look again at the size property declaration:
/** * The number of key-value mappings contained in this map. */
transient int size;
Copy the code
Therefore, we can conclude that when a HashSet object and a HashMap object are created using a no-argument method construct, its initial capacity is 0 (not 16, as many people say by default) and it is initialized only when it is first used.
2. Add elements to the HashSet
Here’s the second line:
set.add("110");
Copy the code
If we call the add method of the set object, if we click add to access the source code, we will find that we are accessing the source code of the set interface. This is because of the nature of Java polymorphism. At compile time, we see the method of the parent class, but at run time, we will recognize the type of the run. It calls the subclass’s methods directly. Add (); add (); add ();
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Copy the code
The value of a map is stored as a static constant and its value is an Object. Moving on to map’s PUT method:
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
Copy the code
PutVal (); putVal (); putVal (); putVal (); putVal ()
- The element e to be added by the HashSet’s add method is passed in the key of the HashMap’s PUT method, and value is passed in a static constant
- The first argument passed to putVal() calls the hash() method. The argument passed is the key value (the element added in the original HashSet). The code is as follows:
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
As you can see, this is a static method. The parameter type received is Object, and the key passed in is upcast to Object. If the object passed in is a null value, it returns 0. Otherwise, the key Object’s hashCode() is evaluated first, which uses Java polymorphism. If the key Object overrides the hashCode() method, the overridden hashCode() method is called; otherwise, the Object’s hashCode() method is called. This calls the hashCode method of String. The resulting hashCode is then moved unsigned to the right, xor with the previous hashCode, and the calculated value is returned. Conclusion: The hashCode() method of an object evaluates to the same value, and the hash() method evaluates to the same value. Now let’s go back to the putVal method.
In the HashMap putVal ()
It is recommended to open the source code to follow. Let’s first look at the putVal method definition:
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict)
Copy the code
The return value type is a generic V that is the type of value, and it is final and cannot be overridden. PutVal (hash(key), key, value, false, true); putVal(hash(key), key, value, false, true); One to one correspondence, convenient to view the subsequent code.
parameter | The arguments |
---|---|
hash | Hash (key) : The calculated hash value of the element to be added in the HashSet |
key | Key, which is the original element e, the string 110. |
value | Value The static constant PRESENT passed in the HashSet |
onlyIfAbsent | false |
evict | true |
Keep going:
Node<K,V>[] tab; Node<K,V> p; int n, i;
Copy the code
Declare local variables, TAB is an array of type Node<K,V>, p is a Node Node, is a singly linked list. N is the length of the array, and I is the subscript (as you’ll see later).
static class Node<K.V> implements Map.Entry<K.V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
Copy the code
As you can see, Enty is a static inner class whose structure is a single linked list that implements the nested interface Enty in the Map interface. You can see why people say HashMap and why the underlying implementation is array + singly linked list. Keep going:
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
Copy the code
Table is a member variable (as mentioned above), and this is the first time we’ve added an element to it, so the table is null. Then assign the table value to TAB and determine whether it is null, true, short-circuit, or execute the statement in if. Call the resize() method. This code is a HashMap expansion method. The implementation is very complicated.
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table; // Table and oldTable are both null
int oldCap = (oldTab == null)?0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {omit...else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY; // Static constant, default capacity, 16
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);// Calculate the capacity to be expanded based on the default loading factor 0.75} omit n lines...@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];// Create an array of type Node with the default size of 16
table = newTab;// Assign to the member variable table
if(oldTab ! =null) {//oldTable is null and will not be executedOmit... }return newTab;// return newTab. Note that the table and newTab members refer to the same object
}
Copy the code
Resize () creates a Node<>[] array of 16 length, assigns the value to the member variable table and returns a reference to the array. Now continue with the code:
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
Copy the code
The TAB already points to a new 16 length Node[] object (the same as table) with n of 16. Keep going:
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
Copy the code
The code above means:
- Compute the subscript in the TAB array based on the hash value and assign the value to p
- If the value of p is null, execute the statement, since we are adding it for the first time, it must be null.
- Execute the statement
tab[i] = newNode(hash, key, value, null);
, stored in a Node Node, according to the above Node construction method, its corresponding respectively are:
The Node properties | The value of the incoming |
---|---|
hash | The key of the hash |
key | The key value, which is the “110” passed in. |
value | Static constant PRESENT |
next | Next node, null |
tab[i = (n - 1) & hash]
, using the properties of bitwise and all 1 is 1, the subscript is obtained, and the value is never greater than n-1
Keep going:
else{do not execute, will be added later when the second time, omit... } ++modCount;if (++size > threshold) // Check capacity expansion
resize();
afterNodeInsertion(evict);
return null; // Add success, return null
Copy the code
Return to the put method, which also returns null, and return to the add method of HashSet:
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Copy the code
True is returned after successful addition. At this point, our first add is parsed and we continue on the second line to add a duplicate object
Adding repeating elements
Without further ado, skip to the putVal() method:
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0) // The if statement is no longer null, but is still assigned, TAB is the member variable table, and n is the length
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)// Hash the subscript according to the hash. Since the contents of String are the same, hashCode is the same, so I is the same as the first time, and p is the element in the corresponding subscript. But not empty, else block
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
/* * The key to hashMap de-weighting is highlighted below
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value; // The value of the corresponding key already exists
afterNodeAccess(e);
return oldValue; // Return the value before the update
}
Copy the code
Key code analysis for HashMap de-duplication:
else {
Node<K,V> e; K k;// If the e value is Null, the key value is successfully added; otherwise, the key value already exists
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}Copy the code
This code is executed only if the index of table[] calculated from the hash value is not empty when an element is stored in it. Else code block above is executed. Here’s a line-by-line analysis:
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
Copy the code
In this case, p stores the value of table[I], which is the first node of the singly linked list. Determines the hash value of the stored object and the hash value of the currently stored element. If they are different, execute the following code directly. After the same hash value is executed ((k = p.k ey) = = key) | | key! = null && key.equals(k))) = null && key.equals(k)) = null && key.equals(k))) = null && key.equals(k)) = null && key.equals(k)) Otherwise, the equals method is used to compare non-null and call the key value. If the result is true, the object is the same. Assign the p value to e (if (e! = null) {// existing mapping for key will check the value of e, null means that the key value is not the same, has been saved into the Map, non-null means that the value of e is the value of P, i.e. the existing Node with the same key value Node).
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
Copy the code
At this point, prove that the key value of the first node of table[I] is different, and judge whether it has been transformed into a red-black tree. If it has been transformed into a red-black tree, then transform into TreeNode to judge and add. TreeNode inherits Entry from the static inner class of LinkedHashMap, which in turn inherits Node from the static inner class of HashMap. Keep reading:
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {// Update e to the next node. If the value of e traversed is null, it means that the key of the element to be added is unique to all elements on the list
p.next = newNode(hash, key, value, null);// Add Node Node and store the element to be added
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st // Determine the length of the single-linked list. If the length is greater than or equal to 8, the list is stored as a red-black tree
treeifyBin(tab, hash);
break;
}
if (e.hash == hash && // If the key value of e is the same as that of the element in the next node, the loop is interrupted directly. In this case, e is still not empty, indicating that there is a key value((k = e.key) == key || (key ! =null && key.equals(k))))
break;
p = e; // Update the p value to point to the next node
}
Copy the code
This code block executes only if the first node in table[I] has a duplicate key and has not yet been converted to a red-black tree. A loop is a loop that starts iterating through a single linked list until the next node is null or a duplicate key is found. When the next node is null, it means that the key value of the element to be added is different from that of all the elements in the list. In this case, a new node will be added directly and the data will be saved. If (e! = null) {// existing mapping for key will determine the value of e, null means that the key value is not the same, has been saved into the Map, non-null means that the value of E is the value of P, i.e. the existing Node with the same key value Node. If there is a duplicate key, the value corresponding to the key is updated and the old value is returned. In a HashSet, value is a static constant, so it’s all the same. Otherwise, null is returned, indicating that the data is stored in the HashMap.