[TOC]

Add method in HashSet source code details

Introduction of HashSet

AbstractSet is an implementation class of Java container framework. AbstractSet is an implementation class of Java container framework, which is derived from AbstractSet. HashSet is used to store single-value non-repeating containers. But there can only be one NULL key value. Let’s start by creating an empty HashSet and then add elements to it, walking through the source code step by step to see how a HashSet is implemented.

Source code analysis

Before source code analysis, we first write a piece of code, and then according to this code step by step in-depth source code, analysis. The code is simple as follows:

    Set<String> set = new HashSet<>();
    set.add("110");
    set.add("110");
    set.add(new String("110"));
    System.out.println(set.size())
Copy the code

1. Construction method

Set

Set = new HashSet(); In a literal sense, there are the following:

  1. This code creates a new HashSet object by calling the HashSet parameterless constructor
  2. Assigns a reference to an object to the interface Set that it implements
  3. Passing a generic String to the elements of a Set will allow the Set to hold only String data at compile time. (Strings are final and cannot be inherited.)

HashSet parameterless constructor

HashSet constructor:

    private transient HashMap<E,Object> map;
        // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();
    public HashSet(a) {
            map = new HashMap<>();
        }
Copy the code

The first line of the code declares a HashMap member variable, map, of type

. The underlying implementation of a HashSet is implemented using a HashMap. A HashSet takes advantage of the fact that the key value in a HashMap does not repeat, and is used to store its own elements. The value in a Map is the static constant PRESENT on the second line. Is an Object Object. We can verify this later. As you can see from the above code, when the no-parameter constructor of a HashSet is called, it simply calls the no-parameter constructor of a HashMap to initialize the member variable map.
,>

HashMap takes no arguments constructor

public HashMap(a) {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
Copy the code

As you can see, the no-parameter constructor of the HashMap is also very simple. It simply sets the loadFactor of the member variable loadFactor, which is a static constant of 0.75F and does not set its capacity. Then let’s look at how the member variables used to store the data in the HashMap are declared:

/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) */
    transient Node<K,V>[] table;
Copy the code

As noted in the documentation, the table is initialized the first time it is used and must be resized the first time it is used. The length of the table must be a power of two. Let’s look again at the size property declaration:

/** * The number of key-value mappings contained in this map. */
    transient int size;
Copy the code

Therefore, we can conclude that when a HashSet object and a HashMap object are created using a no-argument method construct, its initial capacity is 0 (not 16, as many people say by default) and it is initialized only when it is first used.

2. Add elements to the HashSet

Here’s the second line:

set.add("110");
Copy the code

If we call the add method of the set object, if we click add to access the source code, we will find that we are accessing the source code of the set interface. This is because of the nature of Java polymorphism. At compile time, we see the method of the parent class, but at run time, we will recognize the type of the run. It calls the subclass’s methods directly. Add (); add (); add ();

public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }
Copy the code

The value of a map is stored as a static constant and its value is an Object. Moving on to map’s PUT method:

public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }
Copy the code

PutVal (); putVal (); putVal (); putVal (); putVal ()

  1. The element e to be added by the HashSet’s add method is passed in the key of the HashMap’s PUT method, and value is passed in a static constant
  2. The first argument passed to putVal() calls the hash() method. The argument passed is the key value (the element added in the original HashSet). The code is as follows:
    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

As you can see, this is a static method. The parameter type received is Object, and the key passed in is upcast to Object. If the object passed in is a null value, it returns 0. Otherwise, the key Object’s hashCode() is evaluated first, which uses Java polymorphism. If the key Object overrides the hashCode() method, the overridden hashCode() method is called; otherwise, the Object’s hashCode() method is called. This calls the hashCode method of String. The resulting hashCode is then moved unsigned to the right, xor with the previous hashCode, and the calculated value is returned. Conclusion: The hashCode() method of an object evaluates to the same value, and the hash() method evaluates to the same value. Now let’s go back to the putVal method.

In the HashMap putVal ()

It is recommended to open the source code to follow. Let’s first look at the putVal method definition:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict)
Copy the code

The return value type is a generic V that is the type of value, and it is final and cannot be overridden. PutVal (hash(key), key, value, false, true); putVal(hash(key), key, value, false, true); One to one correspondence, convenient to view the subsequent code.

parameter The arguments
hash Hash (key) : The calculated hash value of the element to be added in the HashSet
key Key, which is the original element e, the string 110.
value Value The static constant PRESENT passed in the HashSet
onlyIfAbsent false
evict true

Keep going:

Node<K,V>[] tab; Node<K,V> p; int n, i;
Copy the code

Declare local variables, TAB is an array of type Node<K,V>, p is a Node Node, is a singly linked list. N is the length of the array, and I is the subscript (as you’ll see later).

static class Node<K.V> implements Map.Entry<K.V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
Copy the code

As you can see, Enty is a static inner class whose structure is a single linked list that implements the nested interface Enty in the Map interface. You can see why people say HashMap and why the underlying implementation is array + singly linked list. Keep going:

if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
Copy the code

Table is a member variable (as mentioned above), and this is the first time we’ve added an element to it, so the table is null. Then assign the table value to TAB and determine whether it is null, true, short-circuit, or execute the statement in if. Call the resize() method. This code is a HashMap expansion method. The implementation is very complicated.

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;    // Table and oldTable are both null
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {omit...else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;    // Static constant, default capacity, 16
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);// Calculate the capacity to be expanded based on the default loading factor 0.75} omit n lines...@SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];// Create an array of type Node with the default size of 16
    table = newTab;// Assign to the member variable table
     if(oldTab ! =null) {//oldTable is null and will not be executedOmit... }return newTab;// return newTab. Note that the table and newTab members refer to the same object
 }
Copy the code

Resize () creates a Node<>[] array of 16 length, assigns the value to the member variable table and returns a reference to the array. Now continue with the code:

if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
Copy the code

The TAB already points to a new 16 length Node[] object (the same as table) with n of 16. Keep going:

 if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
Copy the code

The code above means:

  1. Compute the subscript in the TAB array based on the hash value and assign the value to p
  2. If the value of p is null, execute the statement, since we are adding it for the first time, it must be null.
  3. Execute the statementtab[i] = newNode(hash, key, value, null);, stored in a Node Node, according to the above Node construction method, its corresponding respectively are:
The Node properties The value of the incoming
hash The key of the hash
key The key value, which is the “110” passed in.
value Static constant PRESENT
next Next node, null
  1. tab[i = (n - 1) & hash], using the properties of bitwise and all 1 is 1, the subscript is obtained, and the value is never greater than n-1

Keep going:

else{do not execute, will be added later when the second time, omit... } ++modCount;if (++size > threshold)    // Check capacity expansion
        resize();
    afterNodeInsertion(evict);
    return null;    // Add success, return null
Copy the code

Return to the put method, which also returns null, and return to the add method of HashSet:

public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }
Copy the code

True is returned after successful addition. At this point, our first add is parsed and we continue on the second line to add a duplicate object

Adding repeating elements

Without further ado, skip to the putVal() method:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0) // The if statement is no longer null, but is still assigned, TAB is the member variable table, and n is the length
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)// Hash the subscript according to the hash. Since the contents of String are the same, hashCode is the same, so I is the same as the first time, and p is the element in the corresponding subscript. But not empty, else block
            tab[i] = newNode(hash, key, value, null);
        else {
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        /* * The key to hashMap de-weighting is highlighted below
        else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;    // The value of the corresponding key already exists
            afterNodeAccess(e);
            return oldValue;    // Return the value before the update
        }
Copy the code

Key code analysis for HashMap de-duplication:

else {
        Node<K,V> e; K k;// If the e value is Null, the key value is successfully added; otherwise, the key value already exists
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}Copy the code

This code is executed only if the index of table[] calculated from the hash value is not empty when an element is stored in it. Else code block above is executed. Here’s a line-by-line analysis:

if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
Copy the code

In this case, p stores the value of table[I], which is the first node of the singly linked list. Determines the hash value of the stored object and the hash value of the currently stored element. If they are different, execute the following code directly. After the same hash value is executed ((k = p.k ey) = = key) | | key! = null && key.equals(k))) = null && key.equals(k)) = null && key.equals(k))) = null && key.equals(k)) = null && key.equals(k)) Otherwise, the equals method is used to compare non-null and call the key value. If the result is true, the object is the same. Assign the p value to e (if (e! = null) {// existing mapping for key will check the value of e, null means that the key value is not the same, has been saved into the Map, non-null means that the value of e is the value of P, i.e. the existing Node with the same key value Node).

else if (p instanceof TreeNode)
        e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
Copy the code

At this point, prove that the key value of the first node of table[I] is different, and judge whether it has been transformed into a red-black tree. If it has been transformed into a red-black tree, then transform into TreeNode to judge and add. TreeNode inherits Entry from the static inner class of LinkedHashMap, which in turn inherits Node from the static inner class of HashMap. Keep reading:

else {
        for (int binCount = 0; ; ++binCount) {
            if ((e = p.next) == null) {// Update e to the next node. If the value of e traversed is null, it means that the key of the element to be added is unique to all elements on the list
                p.next = newNode(hash, key, value, null);// Add Node Node and store the element to be added
                if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st // Determine the length of the single-linked list. If the length is greater than or equal to 8, the list is stored as a red-black tree
                    treeifyBin(tab, hash);
                break;
            }
            if (e.hash == hash &&    // If the key value of e is the same as that of the element in the next node, the loop is interrupted directly. In this case, e is still not empty, indicating that there is a key value((k = e.key) == key || (key ! =null && key.equals(k))))
                break;
            p = e;    // Update the p value to point to the next node
        }
Copy the code

This code block executes only if the first node in table[I] has a duplicate key and has not yet been converted to a red-black tree. A loop is a loop that starts iterating through a single linked list until the next node is null or a duplicate key is found. When the next node is null, it means that the key value of the element to be added is different from that of all the elements in the list. In this case, a new node will be added directly and the data will be saved. If (e! = null) {// existing mapping for key will determine the value of e, null means that the key value is not the same, has been saved into the Map, non-null means that the value of E is the value of P, i.e. the existing Node with the same key value Node. If there is a duplicate key, the value corresponding to the key is updated and the old value is returned. In a HashSet, value is a static constant, so it’s all the same. Otherwise, null is returned, indicating that the data is stored in the HashMap.