The problem

(1) What is the difference between a Collection and a Set?

(2) How to ensure that the elements added to the HashSet are not repeated?

(3) Does HashSet allow null elements?

(4) Is HashSet ordered?

(5) Is HashSet synchronous?

(6) What is fail-fast?

Introduction to the

Set, the concept is a little fuzzy.

Broadly speaking, collections in Java refer to the container classes under the java.util package, including all classes related to Collections and maps.

In terms of meaning, we generally refer to Collection classes in Java collections, not Map classes.

In a narrow sense, a mathematical Set refers to a container that does not contain duplicate elements, that is, there are no two identical elements in the Set, which correspond to a Set in Java.

Specific how to understand or to look at the context.

For example, if you’re being asked to talk about collections in Java, you’re probably speaking broadly.

For example, adding all elements from another Set to a Set is neutral.

A HashSet is an implementation of a Set, and the underlying use of a HashMap is to ensure that elements are not duplicated.

Source code analysis

attribute

    // Use HashMap internally
    private transient HashMap<E,Object> map;

    // Virtual object, used as value in map
    private static final Object PRESENT = new Object();
Copy the code

A constructor

public HashSet(a) {
    map = new HashMap<>();
}

public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1.16));
    addAll(c);
}

public HashSet(int initialCapacity, float loadFactor) {
    map = new HashMap<>(initialCapacity, loadFactor);
}

public HashSet(int initialCapacity) {
    map = new HashMap<>(initialCapacity);
}

// Non-public, mainly used for LinkedHashSet
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<>(initialCapacity, loadFactor);
}
Copy the code

Constructors call the corresponding constructors of the HashMap.

The last constructor is a bit special in that it is not public, meaning it can only be called by the same package or subclass, which is unique to LinkedHashSet.

Add elements

Call the put() method of the HashMap directly, using the element itself as the key and PRESENT as the value, meaning that all values in the map are the same.

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}
Copy the code

Remove elements

Call the remove() method of the HashMap directly. Note that the map remove returns a value for the removed element, while the Set remov returns a Boolean.

And just to check, if it’s null it means there’s no element, but if it’s not null it equals PRESENT.

public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
}
Copy the code

Query element

Set doesn’t have a get() method because get doesn’t seem to make sense, unlike a List where you can get elements by index.

There’s only one method that checks for the presence of the element, contains(), which directly calls the map’s containsKey() method.

public boolean contains(Object o) {
    return map.containsKey(o);
}
Copy the code

Traverse elements

The iterator to the map’s keySet is called directly.

public Iterator<E> iterator(a) {
    return map.keySet().iterator();
}
Copy the code

All the source code

package java.util;

import java.io.InvalidObjectException;
import sun.misc.SharedSecrets;


public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable.java.io.Serializable
{
    static final long serialVersionUID = -5024744406713321676L;

    // The inner elements are stored in the HashMap
    private transient HashMap<E,Object> map;

    // The virtual element is stored in the value of the map element
    private static final Object PRESENT = new Object();

    // Empty constructor
    public HashSet(a) {
        map = new HashMap<>();
    }

    // Add all the elements of another Set to the current Set
    // Note that the map is initialized with its initial capacity calculated
    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1.16));
        addAll(c);
    }

    // Specify the initial capacity and load factor
    public HashSet(int initialCapacity, float loadFactor) {
        map = new HashMap<>(initialCapacity, loadFactor);
    }

    // Specify only the initial capacity
    public HashSet(int initialCapacity) {
        map = new HashMap<>(initialCapacity);
    }

    // LinkedHashSet special method
    // dummy has no real meaning, just to keep up with the above method signature
    HashSet(int initialCapacity, float loadFactor, boolean dummy) {
        map = new LinkedHashMap<>(initialCapacity, loadFactor);
    }
    
    / / the iterator
    public Iterator<E> iterator(a) {
        return map.keySet().iterator();
    }

    // Number of elements
    public int size(a) {
        return map.size();
    }

    // Check if it is null
    public boolean isEmpty(a) {
        return map.isEmpty();
    }

    // Check whether an element is included
    public boolean contains(Object o) {
        return map.containsKey(o);
    }
    
    // Add elements
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

    // Delete elements
    public boolean remove(Object o) {
        return map.remove(o)==PRESENT;
    }
    
    // Clear all elements
    public void clear(a) {
        map.clear();
    }

    // Clone method
    @SuppressWarnings("unchecked")
    public Object clone(a) {
        try {
            HashSet<E> newSet = (HashSet<E>) super.clone();
            newSet.map = (HashMap<E, Object>) map.clone();
            return newSet;
        } catch (CloneNotSupportedException e) {
            throw newInternalError(e); }}// serialize the write method
    private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException {
        // Write a non-static, non-transient property
        s.defaultWriteObject();

        // Write out the map's capacity and load factor
        s.writeInt(map.capacity());
        s.writeFloat(map.loadFactor());

        // Write the number of elements
        s.writeInt(map.size());

        // Iterate over all the elements
        for (E e : map.keySet())
            s.writeObject(e);
    }

    // serialize the read in method
    private void readObject(java.io.ObjectInputStream s)
        throws java.io.IOException, ClassNotFoundException {
        // Read a non-static, non-transient property
        s.defaultReadObject();

        // Read in the capacity and check that it cannot be less than 0
        int capacity = s.readInt();
        if (capacity < 0) {
            throw new InvalidObjectException("Illegal capacity: " +
                                             capacity);
        }

        // Read the load factor and check that it cannot be less than or equal to 0 or NaN(Not a Number)
        // java.lang.Float.NaN = 0.0f / 0.0f;
        float loadFactor = s.readFloat();
        if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
            throw new InvalidObjectException("Illegal load factor: " +
                                             loadFactor);
        }

        // Read in the number of elements and check that it cannot be less than 0
        int size = s.readInt();
        if (size < 0) {
            throw new InvalidObjectException("Illegal size: " +
                                             size);
        }
        // Reset the capacity based on the number of elements
        // This is to ensure that the map has enough capacity to accommodate all elements and prevent unnecessary expansion
        capacity = (int) Math.min(size * Math.min(1 / loadFactor, 4.0 f),
                HashMap.MAXIMUM_CAPACITY);

        // Double check something, ignore unimportant code
        SharedSecrets.getJavaOISAccess()
                     .checkArray(s, Map.Entry[].class, HashMap.tableSizeFor(capacity));

        // Create a map and check if it is of type LinkedHashSetmap = (((HashSet<? >)this) instanceof LinkedHashSet ?
               new LinkedHashMap<E,Object>(capacity, loadFactor) :
               new HashMap<E,Object>(capacity, loadFactor));

        // Read in all the elements and place them in the map
        for (int i=0; i<size; i++) {
            @SuppressWarnings("unchecked") E e = (E) s.readObject(); map.put(e, PRESENT); }}// Divisible iterator, mainly used for multi-threaded parallel iteration processing
    public Spliterator<E> spliterator(a) {
        return new HashMap.KeySpliterator<E,Object>(map, 0, -1.0.0); }}Copy the code

conclusion

(1) Inside the HashSet, the key of the HashMap is used to store elements, so as to ensure that elements are not repeated;

(2) HashSet is unordered because the key of a HashMap is unordered;

(3) A null element is allowed in a HashSet because a HashMap allows a null key;

(4) HashSet is not thread-safe;

(5) HashSet does not have a get() method;

eggs

(1) Ali manual says, when using Java collection to specify the size of their own collection, through this source analysis, you know how to initialize HashMap initial capacity?

We find that the following constructor tells us very clearly how to specify capacity.

If we expect a HashMap to store n elements, its capacity should be specified as (n/0.75f) + 1. If this value is less than 16, we use 16 instead.

The specified capacity during initialization reduces capacity expansion times and improves efficiency.

public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1.16));
    addAll(c);
}
Copy the code

(2) What is fail-fast?

The Fail-fast mechanism is a bug mechanism in Java collections.

When using iterators iteration, if found changes, set the failure to respond quickly, throw ConcurrentModificationException.

Such changes can be made by other threads or by the current thread itself, such as by calling remove() to remove elements during iteration.

Also, not all collections in Java have fail-fast mechanisms. For example, final-consistent ConcurrentHashMap, CopyOnWriterArrayList, etc., do not have fast-fail.

So how does Fail-Fast come about?

If you are careful, you may notice that there is an attribute called “modCount” in both ArrayList and HashMap. The value is incremented by 1 each time the set is modified, and this value is recorded in the expectedModCount before traversal to check whether the two are consistent during the traversal. If there is any inconsistency, it indicates that there is modification. Throw ConcurrentModificationException.


Welcome to pay attention to my public number “Tong Elder brother read source code”, view more source code series articles, with Tong elder brother tour the ocean of source code.