Make writing a habit together! This is my first day to participate in the “Gold Digging Day New Plan · April More text challenge”, click to see the details of the activity.

An overview of the

Java’s collections framework is powerful because it encapsulates commonly used data structures and algorithms, making it easy for Java developers to use the collections framework API without having to master the underlying data structures and algorithms. For example, the most common array, linked list and queue, stack to binary tree, red black tree, JDK will be different degrees of encapsulation, greatly improve the development efficiency of developers. But while we don’t have to deal with this kind of low-level logic directly in our projects, we still need to be familiar with its content. Because in extreme business scenarios, we may need to debug or even optimize the underlying data structure, mastering the underlying structure is a necessary skill.

Below is a Collection interface diagram.

As we can see from the figure, the Collection interface has three main sub-interfaces:Set,List,Queue.

Rounding – the List

Let’s start with the List interface.

The characteristics of the List

The stored elements are ordered (the order in which they are stored and retrieved).
The same element can be stored repeatedly (repeatable).
Elements can be manipulated by index values.

List interface contains ArrayList, Vector, LinkedList three classes to implement the List interface respectively.

Let’s discuss the differences in more detail.

We classify them into two categories according to the underlying data structure:

The bottom layer is arrays. Array query is fast, add and delete is slow.
At the bottom are linked lists. Linked list query is slow, add and delete fast.

ArrayList and Vector belong to the underlying class of arrays.

What’s the difference between ArrayList and Vector?

ArrayListThreads are not safe, but they are efficient and are suitable for frequent lookups.
VectorThread-safe, but inefficient. It was also one of Java’s earliest thread-safe dynamic arrays and is not recommended for use in real production environments.

Source code analysis

ArrayList.java


    transient Object[] elementData; // non-private to simplify nested class access

    private int size;

    public ArrayList(int initialCapacity) {
        if (initialCapacity > 0) {
            this.elementData = new Object[initialCapacity];
        } else if (initialCapacity == 0) {
            this.elementData = EMPTY_ELEMENTDATA;
        } else {
            throw new IllegalArgumentException("Illegal Capacity: "+ initialCapacity); }}Copy the code

It can be seen from the source code that ArrayList uses Object[] array, and check the source code of ArrayList, it is found that the class does not use the keyword of synchronized and any other lock, so ArrayList is not thread safe.


    public synchronized void copyInto(Object[] anArray) {
        System.arraycopy(elementData, 0, anArray, 0, elementCount);
    }

    public synchronized void trimToSize(a) {
        modCount++;
        int oldCapacity = elementData.length;
        if(elementCount < oldCapacity) { elementData = Arrays.copyOf(elementData, elementCount); }}public synchronized void ensureCapacity(int minCapacity) {
        if (minCapacity > 0) { modCount++; ensureCapacityHelper(minCapacity); }}Copy the code

In Vector, we use the same array structure and the synchronized keyword in our methods, which makes Vector inherently thread-safe. However, a large number of synchronous locks will cause a large number of blocked threads to wait in high concurrency scenarios, which is inefficient.

LinkedList

Let’s take a look at LinkedList. The underlying LinkedList is a LinkedList structure, which is not safe for threads and high efficiency.

Let’s look at the difference between ArrayList and LinkedList.

Difference between ArrayList and LinkedList

ArrayListThe bottom layer isObjectArray, whereas LinkedList is the underlying oneTwo-way linked listThe structure of the.

Note: JDK1.6(inclusive) and the previous use of bidirectional circular linked lists, JDK1.7 began to eliminate the loop.

Reference – Bidirectional linked list: Contains two Pointers, prev to the previous node and next to the next node.

Reference – A bidirectional circular list: the next of the last node points to the head, and the prev of the head points to the last node, forming a loop.

ArrayListThe time complexity of inserts and deletions is influenced by the location of elements. Here’s why:ArrayListThe element is appended to the end, and the time is O(1). At this point we are at the specified location by indexiInsert or delete elements, then the time complexity is O(n-i) (n is the length of the array). This is because the ith and subsequent (n-i) elements in the array are moved one bit back or one bit forward.
LinkedListThe time complexity of inserts and deletions is independent of element location. Because its data structure is bidirectional linked list, so inadd(E e),addFirst(E e),addLast(E e),removeFirst() 、 removeLast()Operation, its time complexity is O(1), and subscript by indexiThe time complexity for inserting or deleting is O(n). This is because the list pointer needs to be moved to the specified position before insertion.
ArrayListsupportRandomAccess(i.e. efficient random access), whileLinkedListIs not supported.

Source code analysis

    /** * Pointer to first node. * Invariant: (first == null && last == null) || * (first.prev == null && first.item ! = null) */
    transient Node<E> first;

    /** * Pointer to last node. * Invariant: (first == null && last == null) || * (last.next == null && last.item ! = null) */
    transient Node<E> last;
Copy the code

Above is the source code for LinkedList. The LinkedList class defines two nodes, first and last, pointing to the first and last elements, respectively. It is a typical linked-list structure.

The LinkedList class does not use locks, so it is also thread-unsafe.

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess.Cloneable.java.io.Serializable
{... }Copy the code

ArrayList implements RandomAccess, which, like Cloneable, serves only as an identifier. ArrayList supports efficient random access. If Cloneable is implemented, ArrayList supports cloning. At the same time, the implementation of the Serializable interface indicates that the class can be serialized.

public class LinkedList<E>
    extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable.java.io.Serializable
{... }Copy the code

You can see that LinkedList does not support efficient random access. It implements a two-endian queue Deque, and the rest is the same as ArrayList.

Expansion mechanism for ArrayList

An ArrayList is a dynamic array that can grow dynamically compared to a traditional array. In the source code, the core method of expansion is grow(), here we analyze the grow() method in detail, to understand the detailed expansion mechanism, you can consult the source code or search relevant information.

Source code analysis

    // minCapacity: indicates the minimum required capacity
    private void grow(int minCapacity) {
        // oldCapacity: oldCapacity. NewCapacity: newCapacity
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        // The MAX_ARRAY_SIZE constant value is integer.max_value - 8
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }
Copy the code

Int newCapacity + (oldCapacity >> 1); Move oldCapacity one bit to the right, equivalent to oldCapacity / 2. The purpose of using the bit operation instead of the division operation is that the bit operation is much faster than the division operation, which improves efficiency and saves resources.

The logic of the grow method is as follows: move oldCapacity one bit to the right and determine whether the newCapacity is smaller than the minimum required capacity. If the condition is met (newCapacity < minCapacity), set the newCapacity to the minimum required capacity.

If the new capacity is greater than MAX_ARRAY_SIZE, the hugeCapacity() method is called to compare minCapacity with MAX_ARRAY_SIZE. The hugeCapacity code looks like this:

    private static int hugeCapacity(int minCapacity) {
        if (minCapacity < 0) // overflow
            throw new OutOfMemoryError();
        return (minCapacity > MAX_ARRAY_SIZE) ?
            Integer.MAX_VALUE :
            MAX_ARRAY_SIZE;
    }
Copy the code

That is, if minCapacity is greater than the maximum capacity, the new capacity is integer. MAX_VALUE; otherwise, the new capacity is MAX_ARRAY_SIZE, integer. MAX_VALUE – 8.

Finally, use arrays.copyof (); Method generates a new expanded array and assigns it to the original array, elementData. This enables the expansion of the array.

In addition, ArrayList provides a scaling method called ensureCapacity that can be invoked externally.

    public void ensureCapacity(int minCapacity) {
        intminExpand = (elementData ! = DEFAULTCAPACITY_EMPTY_ELEMENTDATA)// any size if not default element table
            ? 0
            // larger than default for default empty table. It's already
            // supposed to be at default size.
            : DEFAULT_CAPACITY;

        if(minCapacity > minExpand) { ensureExplicitCapacity(minCapacity); }}Copy the code

As you can see from the source code, the ensureCapacity method uses the public modifier and is available for developers to call. Enrecapacity allows developers to dynamically scale arrays before adding large numbers of elements to avoid the performance penalty of frequent scaling. Here is not the focus of this article, specific interest can view the source code.

Rounding out the node-set

Set is also one of the important subinterfaces of Collection. As can be seen from the frame diagram at the beginning of the article, Set contains HashSet, LinkedHashSet, TreeSet and other sub-interfaces.

The characteristics of the Set

Unordered: That is, stored elements are out of order (the order in which they are stored and retrieved).
Uniqueness: All elements are unique.

Set is suitable for scenarios where deduplication is required.

When we classify sets, we can also classify them into two categories:

At the bottom is a hash table, known as a HashMap, as you can see in the source below. The actual implementation of HashMap will be covered in the next article.
The bottom layer is a binary tree.

Hashsets and linkedhashsets fall into the first category, where the underlying layer is a hash table (LinkedHashSet is a hash table + linked list). TreeSet belongs to binary tree classification.

HashSet uses hashCode() and equals() to ensure element uniqueness.

Source code analysis

Let’s look at two source code snippets for HashSet.

public HashSet(int initialCapacity) {
    map = new HashMap<>(initialCapacity);
}
Copy the code

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}
Copy the code

As you can see from the source, the HashSet creates a HashMap object when initialized. When we add the element, we can see that the element is actually stored in the HashMap as a key that corresponds to a PRESENT constant. The PRESENT value is actually a final Object. This explains how a HashSet guarantees element uniqueness.

HashSet is often used in scenarios where element uniqueness is guaranteed, while TreeSet is often used in scenarios where sorting is required. Better performance can be achieved by using HashSet in scenarios where sorting is not required.

Talking about sorting, let’s talk specifically about the Comparable and Comparator interfaces.

ComparableInterface injava.langBag, it has acompareTo(Object obj)Method is used for sorting.
ComparatorInterface injava.utilBag, it has acompare(Object obj1, Object obj2)Method is used for sorting.

Comparable and Comparator interfaces

ComparableInterface is a natural sort interface, it is used to implement natural sort,compareToAlso known as the natural comparison method. When the developer callsCollectionsIn thesortMethod, the object can be implemented on its ownComparableDo a natural sort. This object must be implementedComparableInterface.
ComparatorInterface can be understood as a custom sort interface, developers can not implementComparableThe class of the interface is sorted customarily without modifying the source code of the class.

TreeSet source code analysis

The TreeSet class provides both Comparable and Comparator sorting. Specifically we trace back to the source code.

public TreeSet(a) {
    this(new TreeMap<E,Object>());
}

public TreeSet(Comparator<? super E> comparator) {
    this(new TreeMap<>(comparator));
}
Copy the code

It can be seen from the source code that TreeSet is implemented through TreeMap, so we need to further enter TreeMap to check the specific sorting logic. TreeSet’s constructor provides a no-parameter constructor and a parameterized constructor. The no-parameter constructor will use the Comparable natural ordering in TreeMap, while the parameter constructor passes in a Comparator object.

Now let’s take a look inside TreeMap.

final Entry<K,V> getEntry(Object key) {
    // Offload comparator-based version for sake of performance
    if(comparator ! =null)
        return getEntryUsingComparator(key);
    if (key == null)
        throw new NullPointerException();
    @SuppressWarnings("unchecked")
        Comparable<? super K> k = (Comparable<? super K>) key;
    Entry<K,V> p = root;
    while(p ! =null) {
        int cmp = k.compareTo(p.key);
        if (cmp < 0)
            p = p.left;
        else if (cmp > 0)
            p = p.right;
        else
            return p;
    }
    return null;
}
Copy the code

When the comparator is not empty, the getEntry method calls the getEntryUsingComparator, which calls our custom sorting method. The default is natural sorting through the Comparable interface.

HashSet, LinkedHashSet, TreeSet comparison

HashSet,LinkedHashSet,TreeSetAre allSetThe implementation classes of the interfaces, all of which ensure that elements are unique, are not thread-safe.
HashSetThe underlying data structure is hash table, based onHashMapImplement.LinkedHashSetThe underlying data structure is linked list + hash table. The order in which elements are inserted and retrieved isFirst in first out.TreeSetThe underlying data structure is a red-black tree (self-balanced binary search tree).
HashSetFor scenarios where you don’t need to guarantee the order of elements in and out and don’t care about sorting,LinkedHashSetIt is applicable to ensure that elements are inserted and removed in a first-in, first-out order.TreeSetThis method is applicable to scenarios that require custom collation rules.

Supplement – Queue

A Queue is a single-ended Queue. That is, elements can only be inserted from one segment and deleted from the other. FIFO is a first-in, first-out rule. It extends the Collection interface.

A Deque is a double-endian queue, that is, both queues can insert or delete elements. It extends the Queue interface and provides special methods like push and POP that can be used to emulate the stack structure.

The LinkedList class is the most commonly used class in the Deque interface, as described above.

conclusion

At the end of this article, I summarize a table that compares the characteristics of the objects described above.

object	Underlying data structure	Thread safe or not
ArrayList	An array of	no
Vector	An array of	is
LinkedList	The list	no
HashSet	Hash table	no
LinkedHashSet	Hash table + linked list	no
TreeSet	Red and black tree	no

Collection in A Java container is an interview topic and is often used in development. It is extremely necessary to master the contents of the Collection system.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Java container (a) – Collection system and source code analysis

An overview of the

Rounding – the List

The characteristics of the List

What’s the difference between ArrayList and Vector?

Source code analysis

LinkedList

Difference between ArrayList and LinkedList

Source code analysis

Expansion mechanism for ArrayList

Source code analysis

Rounding out the node-set

The characteristics of the Set

Source code analysis

Comparable and Comparator interfaces

TreeSet source code analysis

HashSet, LinkedHashSet, TreeSet comparison

Supplement – Queue

conclusion

Java container (a) – Collection system and source code analysis

An overview of the

Rounding – the List

The characteristics of the List

What’s the difference between ArrayList and Vector?

Source code analysis

LinkedList

Difference between ArrayList and LinkedList

Source code analysis

Expansion mechanism for ArrayList

Source code analysis

Rounding out the node-set

The characteristics of the Set

Source code analysis

Comparable and Comparator interfaces

TreeSet source code analysis

HashSet, LinkedHashSet, TreeSet comparison

Supplement – Queue

conclusion

Related Posts

Install MySQL with Yum Repository on CentOS7

Efficient r & D operation and maintenance system construction process and methodology

How to convert a string to an InputStream in Java? Recommended | Java Debug Notes