This article was originally intended to write an overview of the Java Collections framework, but it did not work well, perhaps because the entire Java collections framework has not been fully understood. So first to the source code analysis, write all the collection class analysis, and then to the overall overview. Let’s start with the most commonly used ArrayList.

An overview of the

An ArrayList is a linear table data structure that can grow and shrink dynamically, allowing duplicate elements, and null values. Implemented based on dynamic arrays, it is contiguous in memory, unlike linked lists. In addition, it is not thread-safe; its counterpart, Vector, an ordered sequence that is also implemented based on dynamic arrays, is thread-safe.

Because arrays occupy contiguous memory space in memory, an ArrayList has random access capability, with a time complexity of O(1) for random access by subscript. Similarly, in order to ensure the continuity of memory, its insert and delete operations are much less efficient. To insert data at a specified location, move all data after that location to make room. To delete data in a specified location, all data after the location must be moved forward to ensure spatial continuity. Their average time complexity is O(n).

ArrayList is relatively simple to use, but there are two problems with the source code:

What is the initial size of an ArrayList? How does it dynamically expand?

The source code parsing

Class declaration

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess.Cloneable.java.io.Serializable {}
Copy the code
  • CollectionIs the root interface for all collections, defining some generic behavior, abstract classesAbstractCollectionProvides a generic implementation of partial collection type independence.ListThe interface has been extended for ordered collectionsCollectionInterfaces, abstract classesAbstractListPartial default implementations are provided, of courseArrayListRather than taking everything as it is, the rewrite provides its own implementation.
  • To achieve theRandomAccessIt supports fast random access, and does not actually implement any methods. It should only be used as a tag.
  • implementationCloneableInterface to provide shallow copy.
  • To achieve theSerializableInterface, which provides serialization capability, was rewrittenreadObject()writeObject()Methods.

Member variables

private static final int DEFAULT_CAPACITY = 10; // Default initial capacity
private static final Object[] EMPTY_ELEMENTDATA = {}; // Share an empty array
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {}; // Empty arrays are shared by default
transient Object[] elementData; // An array of real data
private int size; // The current number of elements
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; // Maximum array capacity
Copy the code

ElementData is the array that actually holds the data. It’s easy to get confused about its default size. Seeing that DEFAULT_CAPACITY is 10, it’s tempting to think that once I create a new ArrayList, its default size is 10. This is not the case, as you will understand later when you see the constructor.

The maximum size of the array is integer.max_value – 8, which should be familiar to you. The AbstractStringBuilder class is used to store char[], the maximum size of a character. Considering that some virtual machine implementations retain array object headers, anything larger than this value may result in OOM, note that this is only possible. But if it is greater than integer. MAX_VALUE, OOM will be thrown.

The constructor

public ArrayList(a) {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
Copy the code

DEFAULTCAPACITY_EMPTY_ELEMENTDATA is an empty array, so when you call List List = new ArrayList(), you actually create an empty array, not an array of capacity 10.

public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+ initialCapacity); }}Copy the code

When we can estimate the number of elements that the ArrayList needs to hold, we can directly specify the array size initialCapacity to avoid performance loss and space waste caused by subsequent automatic expansion. InitialCapacity size follows the following rules:

  • If greater than 0, create an array of the specified size
  • Equal to 0, use a member variableEMPTY_ELEMENTDATA, it is an empty array
  • If the value is less than 0, an exception is thrown
public ArrayList(Collection<? extends E> c) {
    elementData = c.toArray();
    if((size = elementData.length) ! =0) {
        // c.toArray might (incorrectly) not return Object[] (see 6260652)
        if(elementData.getClass() ! = Object[].class) elementData = Arrays.copyOf(elementData, size, Object[].class); }else {
        // replace with empty array.
        this.elementData = EMPTY_ELEMENTDATA; }}Copy the code

We can also initialize an ArrayList with a collection. The toArray() method calling the collection is converted to an array and assigned to elementData. If the collection passed in has a length of 0, the empty array EMPTY_ELEMENTDATA is assigned to elementData.

methods

ArrayList provides basic collection operations such as insert, delete, empty, find, and traverse. Let’s take a closer look at the ArrayList implementation from the source code, starting with add().

add()

public void add(int index, E element) {
    rangeCheckForAdd(index); // boundary detection

    ensureCapacityInternal(size + 1);  // Increments modCount!!
    System.arraycopy(elementData, index, elementData, index + 1,
                         size - index); // Move all elements after index
    elementData[index] = element;
    size++;
}
Copy the code

RangeCheckForAdd () this method will be used many times in the back, mainly for edge detection, when the index is greater than the size or less than zero, will throw IndexOutOfBoundsException anomalies.

The second step, ensureCapacityInternal(), ensures that the collection has enough space to continue adding elements, and automatically expands when it runs out of space. This method is so important that it’s at the heart of ArrayList. So let’s see how it expands.

   private void ensureCapacityInternal(int minCapacity) {
        ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
    }

    private void ensureExplicitCapacity(int minCapacity) {
        modCount++;

        // overflow-conscious code
        if (minCapacity - elementData.length > 0)
            grow(minCapacity); / / capacity
    }
Copy the code

Calculate the minimum appropriate space with the calculateCapacity() method:

private static int calculateCapacity(Object[] elementData, int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        return Math.max(DEFAULT_CAPACITY, minCapacity); // If the current array is empty, take the larger values of minCapacity and 10
    }
    return minCapacity;
}
Copy the code

If the collection size is not specified at initialization, the larger values of DEFAULT_CAPACITY (equal to 10) and minCapacity are taken. So, if we build an empty ArrayList, when we add the first element, it defaults to 10.

If minCapacity is larger than the current size of the array, it needs to be expanded.

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length; // The size of the array
    int newCapacity = oldCapacity + (oldCapacity >> 1); // Expand by 1.5 times
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity); // The maximum capacity can only be integer.max_value
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}
Copy the code

Each expansion is 1.5 times the original capacity, so when we can estimate the number of elements, we can specify it directly in the constructor to save space. If the new capacity is larger than MAX_ARRAY_SIZE, integer.max_value – 8, call the hugeCapacity() method again.

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError(); // If the value is less than 0, an overflow occurs and OOM is thrown
    return (minCapacity > MAX_ARRAY_SIZE) ? // The maximum value can be integer.max_value
        Integer.MAX_VALUE : MAX_ARRAY_SIZE;
    }
Copy the code

Finally create a new array using the arrays.copyof () method:

public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {
    @SuppressWarnings("unchecked")
    T[] copy = ((Object)newType == (Object)Object[].class)
        ? (T[]) new Object[newLength]
        : (T[]) Array.newInstance(newType.getComponentType(), newLength);
    System.arraycopy(original, 0, copy, 0,
                    Math.min(original.length, newLength));
    return copy;
}
Copy the code

Once the expansion is complete, you can happily add elements by assigning elementData[size++] directly.

In addition to the one-parameter add(E Element) method, which adds elements to the end of the array, ArrayList also supports adding elements at specified positions, add(int index, E element):

public void add(int index, E element) {
    rangeCheckForAdd(index); // boundary detection

    ensureCapacityInternal(size + 1);  // Increments modCount!!
    System.arraycopy(elementData, index, elementData, index + 1,
                    size - index); // Move all elements after index
    elementData[index] = element;
    size++;
}
Copy the code

Inserting an element at index requires that all elements following the index be moved back to make room for the element to be added, so ArrayList inserts are not very efficient.

remove()

The remove() method also has two, the first of which removes the element at the specified position:

public E remove(int index) {
    rangeCheck(index); // boundary detection

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0) // Move all elements after index
        System.arraycopy(elementData, index+1, elementData, index,
                        numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}
Copy the code

The logic is simple: move all elements after index forward. Note that after the move, the end element of the collection is empty for GC collection. Like inserts, ArrayList deletions are not very efficient, with O(n) time complexity.

The second is to remove the specified element:

public boolean remove(Object o) { // If there are more than one, remove the first one
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true; }}else {
        for (int index = 0; index < size; index++)
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true; }}return false;
}
Copy the code

One thing to note here is that when there are duplicate elements in the collection, whether null or other objects, the remove() method only removes the first of them. Remove fastRemove(); remove fastRemove(); remove fastRemove(); remove fastRemove();

/* * Private remove method that skips bounds checking and does not return the value removed. And does not return the removed value */
private void fastRemove(int index) {
    modCount++;
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                        numMoved);
    elementData[--size] = null; // clear to let GC do its work
}
Copy the code

removeAll() && retainAll()

public boolean removeAll(Collection
        c) {
    Objects.requireNonNull(c);
    return batchRemove(c, false);
}
Copy the code

The removeAll() method removes all elements contained in collection C by calling the batchRemove() implementation.

public boolean retainAll(Collection
        c) {
    Objects.requireNonNull(c);
    return batchRemove(c, true);
}
Copy the code

The retainAll() method is the exact opposite of removeAll(), keeping all elements contained in collection C and removing others, which is also implemented by calling batchRemove().

The batchRemove() method is probably one of the more complex methods in ArrayList, but it’s definitely worth a closer look.

/ * * * *@paramCollection of c *@paramWhen complement is true, the value in the specified set is retained; when false, the value * in the specified set is deleted@returnAll duplicate elements in the array are removed, and whenever a deletion occurs true */ is returned
private boolean batchRemove(Collection<? > c,boolean complement) {
    final Object[] elementData = this.elementData;
    int r = 0, w = 0;
    boolean modified = false;
    try {
        // Walk through the array and check if the set contains the corresponding value, move the value to the front of the array, w final value is the number of elements to keep
        That is, if it is retainAll(), move the same element to the front of the array.
        // If removeAll(), move the different elements to the front of the array
        for (; r < size; r++)
            if (c.contains(elementData[r]) == complement)
                elementData[w++] = elementData[r];
    } finally {
        // Preserve behavioral compatibility with AbstractCollection,
        // even if c.contains() throws.
        if(r ! = size) {// r ! = size, indicating that an exception occurs and the loop is not completed
            System.arraycopy(elementData, r,
                            elementData, w,
                            size - r); // Move the element after r over
            w += size - r;
        }
        // w == size Indicates that all elements are retained. Modified returns false
        if(w ! = size) {// clear to let GC do its work
            for (int i = w; i < size; i++)
                elementData[i] = null;
            modCount += size - w; / / update the modCount
            size = w; // w is the number of elements to save
            modified = true; }}return modified;
}
Copy the code

In general, whether you removeAll() or retainAll(), I batchRemove() always move the elements to the front, remove the elements to the back, and record the number of elements to keep below.

other

The following methods are straightforward, just skim it.

// Get the collection size
public int size(a) {
    return size;
}

// Check whether the set is empty
public boolean isEmpty(a) {
    return size == 0;
}

// Get the element subscript
public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

// Sets the element at the specified position
public E set(int index, E element) {
    rangeCheck(index);

    E oldValue = elementData(index);
    elementData[index] = element;
    return oldValue;
}

// Gets the element at the specified position
public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}

// Clear the collection
public void clear(a) {
    modCount++;

    // clear to let GC do its work
    for (int i = 0; i < size; i++)
        elementData[i] = null;

    size = 0;
}
Copy the code

conclusion

A quick summary of ArrayList:

  • Based on the dynamic array implementation, the automatic expansion is 1.5 times the original
  • It is contiguous in memory and has random access capability
  • The time complexity of getting elements by subscript isO(1)
  • The average time complexity of adding and removing elements isO(n)
  • Allow duplicate elements, allow null values, thread unsafe

Since the title is into the JDK’s ArrayList (1), there must be two more. If you take a close look at the ArrayList source code, you’ll see a field that pops up a lot: modCount, which literally means number of changes. Almost any method that involves modifying a collection will perform modCount++, such as the clear() method:

public void clear(a) {
    modCount++;
    // clear to let GC do its work
    for (int i = 0; i < size; i++)
        elementData[i] = null;
    size = 0;
}
Copy the code

So what does modCount do? That’s what ArrayList (2) in the JDK is all about.

Article first published wechat public account: Bingxin said, focus on Java, Android original knowledge sharing, LeetCode problem solving.

More JDK source code analysis, scan code to pay attention to me!