How does a HashSet guarantee that elements are not duplicated?

Git repository: gitee.com/mydb/interv…

HashSet implements the Set interface, supported by a hash table (actually a HashMap). A HashSet does not guarantee the iteration order of the collection, but does allow the insertion of null values. That is, a HashSet does not guarantee the same insertion order as the iteration order. A HashSet is de-duplicated, meaning that it automatically filters out duplicate elements from the collection, ensuring that the elements stored in the HashSet are unique.

1. Basic usage of HashSet

The basic operations on a HashSet include: add, remove, contains, and size. The performance of these methods is fixed operation time, if the hash function is the right place to scatter the elements in the bucket. The basic usage of HashSet is as follows:

// Create a HashSet
HashSet<String> strSet = new HashSet<>();
// Add data to the HashSet
strSet.add("Java");
strSet.add("MySQL");
strSet.add("Redis");
// Looping prints all elements in the HashSet
strSet.forEach(s -> System.out.println(s));
Copy the code

2. The HashSet disorder

A HashSet does not guarantee that elements are inserted in the same order as they are looping out. A HashSet is an unordered collection.

HashSet<String> mapSet = new HashSet<>();
mapSet.add("Shenzhen");
mapSet.add("Beijing");
mapSet.add("Xi 'an");
// Looping prints all elements in the HashSet
mapSet.forEach(m -> System.out.println(m));
Copy the code

The execution results of the above procedures are as follows:As you can see from the above code and execution results, the order of the HashSet inserts is:Shenzhen -> Beijing -> Xi ‘an, and the order of circular printing is:Xi ‘an -> Shenzhen -> Beijing, soA HashSet is unordered and cannot guarantee the same order of inserts and iterations.

PS: Use LinkedHashSet instead of HashSet to ensure the same insertion and iteration order.

Incorrect use of HashSet

HashSet can only guarantee that the underlying data types do not repeat, but that custom objects do not repeat. Is that right? We use the following example to illustrate this problem.

3.1 HashSet and basic data types

Use HashSet to store basic data types as follows:

HashSet<Long> longSet = new HashSet<>();
longSet.add(666l);
longSet.add(777l);
longSet.add(999l);
longSet.add(666l);
// Looping prints all elements in the HashSet
longSet.forEach(l -> System.out.println(l));
Copy the code

The execution results of the above procedures are as follows:As you can see from the above results, using HashSet ensures that the underlying data types are not duplicated.

3.2 HashSet and custom object types

Next, we store the custom object in a HashSet as follows:

public class HashSetExample {
    public static void main(String[] args) {
        HashSet<Person> personSet = new HashSet<>();
        personSet.add(new Person("Cao cao"."123"));
        personSet.add(new Person("Sun quan"."123"));
        personSet.add(new Person("Cao cao"."123"));
        // Looping prints all elements in the HashSetpersonSet.forEach(p -> System.out.println(p)); }}@Getter
@Setter
@ToString
class Person {
    private String name;
    private String password;

    public Person(String name, String password) {
        this.name = name;
        this.password = password; }}Copy the code

The execution results of the above procedures are as follows:We can see from the above results that custom object types are not deleveraged. No, the HashSet deduplicate function depends on the elements’ hashCode and equals methods, which return true for the same object, otherwise it’s a different object. {hashCode = equals ();} {hashCode = equals ();}

@Override
public int hashCode(a) {
    return Long.hashCode(value);
}
public boolean equals(Object obj) {
    if (obj instanceof Long) {
        return value == ((Long)obj).longValue();
    }
    return false;
}
// omit other sources......
Copy the code

More about the equals and hashCode, see: mp.weixin.qq.com/s/40zaEJEkQ…

To enable HashSet to support custom object de-duplicates, override the hashCode and equals methods on the custom object.

@Setter
@Getter
@ToString
class Person {
    private String name;
    private String password;
    public Person(String name, String password) {
        this.name = name;
        this.password = password;
    }
    @Override
    public boolean equals(Object o) {
        if (this == o) return true; // Reference equality returns true
        // Return false if equal to null, or if the object type is different
        if (o == null|| getClass() ! = o.getClass())return false;
        // Force a custom Person type
        Person persion = (Person) o;
        // Return true if name and password are equal
        return Objects.equals(name, persion.name) &&
                Objects.equals(password, persion.password);
    }
    @Override
    public int hashCode(a) {
        // Compare name and password to see if they are equal
        returnObjects.hash(name, password); }}Copy the code

Re-run the above code, and the execution result is as follows:As can be seen from the above results, the previous repeated term “Cao Cao” has been removed.

4. How does a HashSet ensure that elements are not repeated?

If we look at the process of adding elements to a HashSet, we can see why HashSet ensures that elements are not repeated. Adding elements to a HashSet goes like this: When an object is added to a HashSet, the HashSet evaluates the object’s Hashcode value to determine where the object is added to. It also compares the object’s Hashcode value to the hashcode value of other added objects. If there is no matching Hashcode, the HashSet assumes that the object is not repeated. Inserts the object into the appropriate location. But if objects with the same HashCode value are found, then the equals() method of the objects is called to check whether the objects are really the same. If they are, then the HashSet does not allow duplicate objects to be added to the HashSet, thus ensuring that the elements are not duplicated.

Add a HashSet (HashSet, HashSet, HashSet);

If put() returns null in hashMap, the operation succeeds
public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}
Copy the code

Add (); put (); put (); add ();

// Return value: null if there is no element at the insertion position, otherwise the previous element is returned
public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

PutVal (); putVal(); putVal(); putVal();

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K, V>[] tab;
        Node<K, V> p;
        int n, i;
        // If the hash table is empty, call resize() to create a hash table and record the length of the hash table with the variable n
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        /** * If the specified hash parameter has no bucket in the table, that is, there is no collision with the * hash function, (n-1) & hash computes the slot where the key will be placed * (n-1) & hash essentially hash % n bits faster */
        if ((p = tab[i = (n - 1) & hash]) == null)
            // Insert key-value pairs directly into the map
            tab[i] = newNode(hash, key, value, null);
        else {// Elements already exist in the bucket
            Node<K, V> e;
            K k;
            // Compare the hash value of the first element (node in the array) and the key value
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                // Assign the first element to e, denoted by e
                e = p;
                // The bucket does not contain the key-value pair, and the bucket is a red-black tree
            else if (p instanceof TreeNode)
                e = ((TreeNode<K, V>) p).putTreeVal(this, tab, hash, key, value);
                // There is no such key-value pair in the bucket, and the bucket is a linked list structure
            else {
                for (int binCount = 0; ; ++binCount) {
                    // iterate to the end of the list
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // Check whether the length of the list reaches the threshold to convert the slot node into a red-black tree
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    
      
        and put 
       ,>
      ,>
                    // At the same time, do not repeat the operation, jump out of the loop
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}// Find or create a new key-value pair with key and hashCode equal to the inserted element, and perform the put operation
            if(e ! =null) { // existing mapping for key
                // Record the value of e
                V oldValue = e.value;
                /** * onlyIfAbsent If the value is false or null, you can replace the old value *; otherwise, you do not need to replace */
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                // Callback after access
                afterNodeAccess(e);
                // Return the old value
                returnoldValue; }}// Update structural modification information
        ++modCount;
        // Rehash is performed when the number of key-value pairs exceeds the threshold
        if (++size > threshold)
            resize();
        // Callback after insertion
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

As you can see from the source code above, when putting a key-value pair into a HashMap, the location of the Entry is first determined based on the return value of the key’s hashCode(). If two keys have the same hash value, check whether their equals() is the same. If they are the same, return true. Add () will return false. Failed to add elements to the HashSet. Therefore, if you add an existing element to a HashSet, the newly added collection element will not overwrite the existing element, ensuring that the element is not duplicated. If the element is not repeated, the PUT method will eventually return NULL, and the add method passed to the HashSet will be deemed successful.

conclusion

The underlying HashSet is implemented by HashMap, which allows for the de-duplication of duplicate elements. If you store custom objects, you must override the hashCode and equals methods. A HashSet ensures that elements do not duplicate by using the HashMap put method. If a key exists, it is checked by hashCode and equals before storing it.

When he comes, he is not surprised; when he adds without cause, he is not angry.

Blogger: Programmer born in the 1980s. Hobbies: reading, writing and jogging.

Public number: Java interview analysis

How does a HashSet guarantee that elements are not duplicated?

1. Basic usage of HashSet

2. The HashSet disorder

Incorrect use of HashSet

3.1 HashSet and basic data types

3.2 HashSet and custom object types

4. How does a HashSet ensure that elements are not repeated?

conclusion

Related Posts

Set up Lighttpd server in raspberry PI

Actual combat! Development and principle analysis of single sign-on based on Security+JWT

Storing Snapshot Notes