This article was originally intended to write an overview of the Java Collections framework, but it did not work well, perhaps because the entire Java collections framework has not been fully understood. So first to the source code analysis, write all the collection class analysis, and then to the overall overview. Let’s start with the most commonly used ArrayList.
An overview of the
An ArrayList is a linear table data structure that can grow and shrink dynamically, allowing duplicate elements, and null values. Implemented based on dynamic arrays, it is contiguous in memory, unlike linked lists. In addition, it is not thread-safe; its counterpart, Vector, an ordered sequence that is also implemented based on dynamic arrays, is thread-safe.
Because arrays occupy contiguous memory space in memory, an ArrayList has random access capability, with a time complexity of O(1) for random access by subscript. Similarly, in order to ensure the continuity of memory, its insert and delete operations are much less efficient. To insert data at a specified location, move all data after that location to make room. To delete data in a specified location, all data after the location must be moved forward to ensure spatial continuity. Their average time complexity is O(n).
ArrayList is relatively simple to use, but there are two problems with the source code:
What is the initial size of an ArrayList? How does it dynamically expand?
The source code parsing
Class declaration
public class ArrayList<E> extends AbstractList<E>
implements List<E>, RandomAccess.Cloneable.java.io.Serializable {}
Copy the code
Collection
Is the root interface for all collections, defining some generic behavior, abstract classesAbstractCollection
Provides a generic implementation of partial collection type independence.List
The interface has been extended for ordered collectionsCollection
Interfaces, abstract classesAbstractList
Partial default implementations are provided, of courseArrayList
Rather than taking everything as it is, the rewrite provides its own implementation.- To achieve the
RandomAccess
It supports fast random access, and does not actually implement any methods. It should only be used as a tag. - implementation
Cloneable
Interface to provide shallow copy. - To achieve the
Serializable
Interface, which provides serialization capability, was rewrittenreadObject()
和writeObject()
Methods.
Member variables
private static final int DEFAULT_CAPACITY = 10; // Default initial capacity
private static final Object[] EMPTY_ELEMENTDATA = {}; // Share an empty array
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {}; // Empty arrays are shared by default
transient Object[] elementData; // An array of real data
private int size; // The current number of elements
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; // Maximum array capacity
Copy the code
ElementData is the array that actually holds the data. It’s easy to get confused about its default size. Seeing that DEFAULT_CAPACITY is 10, it’s tempting to think that once I create a new ArrayList, its default size is 10. This is not the case, as you will understand later when you see the constructor.
The maximum size of the array is integer.max_value – 8, which should be familiar to you. The AbstractStringBuilder class is used to store char[], the maximum size of a character. Considering that some virtual machine implementations retain array object headers, anything larger than this value may result in OOM, note that this is only possible. But if it is greater than integer. MAX_VALUE, OOM will be thrown.
The constructor
public ArrayList(a) {
this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
Copy the code
DEFAULTCAPACITY_EMPTY_ELEMENTDATA is an empty array, so when you call List List = new ArrayList(), you actually create an empty array, not an array of capacity 10.
public ArrayList(int initialCapacity) {
if (initialCapacity > 0) {
this.elementData = new Object[initialCapacity];
} else if (initialCapacity == 0) {
this.elementData = EMPTY_ELEMENTDATA;
} else {
throw new IllegalArgumentException("Illegal Capacity: "+ initialCapacity); }}Copy the code
When we can estimate the number of elements that the ArrayList needs to hold, we can directly specify the array size initialCapacity to avoid performance loss and space waste caused by subsequent automatic expansion. InitialCapacity size follows the following rules:
- If greater than 0, create an array of the specified size
- Equal to 0, use a member variable
EMPTY_ELEMENTDATA
, it is an empty array - If the value is less than 0, an exception is thrown
public ArrayList(Collection<? extends E> c) {
elementData = c.toArray();
if((size = elementData.length) ! =0) {
// c.toArray might (incorrectly) not return Object[] (see 6260652)
if(elementData.getClass() ! = Object[].class) elementData = Arrays.copyOf(elementData, size, Object[].class); }else {
// replace with empty array.
this.elementData = EMPTY_ELEMENTDATA; }}Copy the code
We can also initialize an ArrayList with a collection. The toArray() method calling the collection is converted to an array and assigned to elementData. If the collection passed in has a length of 0, the empty array EMPTY_ELEMENTDATA is assigned to elementData.
methods
ArrayList provides basic collection operations such as insert, delete, empty, find, and traverse. Let’s take a closer look at the ArrayList implementation from the source code, starting with add().
add()
public void add(int index, E element) {
rangeCheckForAdd(index); // boundary detection
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index); // Move all elements after index
elementData[index] = element;
size++;
}
Copy the code
RangeCheckForAdd () this method will be used many times in the back, mainly for edge detection, when the index is greater than the size or less than zero, will throw IndexOutOfBoundsException anomalies.
The second step, ensureCapacityInternal(), ensures that the collection has enough space to continue adding elements, and automatically expands when it runs out of space. This method is so important that it’s at the heart of ArrayList. So let’s see how it expands.
private void ensureCapacityInternal(int minCapacity) {
ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}
private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity); / / capacity
}
Copy the code
Calculate the minimum appropriate space with the calculateCapacity() method:
private static int calculateCapacity(Object[] elementData, int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
return Math.max(DEFAULT_CAPACITY, minCapacity); // If the current array is empty, take the larger values of minCapacity and 10
}
return minCapacity;
}
Copy the code
If the collection size is not specified at initialization, the larger values of DEFAULT_CAPACITY (equal to 10) and minCapacity are taken. So, if we build an empty ArrayList, when we add the first element, it defaults to 10.
If minCapacity is larger than the current size of the array, it needs to be expanded.
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length; // The size of the array
int newCapacity = oldCapacity + (oldCapacity >> 1); // Expand by 1.5 times
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity); // The maximum capacity can only be integer.max_value
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
Copy the code
Each expansion is 1.5 times the original capacity, so when we can estimate the number of elements, we can specify it directly in the constructor to save space. If the new capacity is larger than MAX_ARRAY_SIZE, integer.max_value – 8, call the hugeCapacity() method again.
private static int hugeCapacity(int minCapacity) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError(); // If the value is less than 0, an overflow occurs and OOM is thrown
return (minCapacity > MAX_ARRAY_SIZE) ? // The maximum value can be integer.max_value
Integer.MAX_VALUE : MAX_ARRAY_SIZE;
}
Copy the code
Finally create a new array using the arrays.copyof () method:
public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {
@SuppressWarnings("unchecked")
T[] copy = ((Object)newType == (Object)Object[].class)
? (T[]) new Object[newLength]
: (T[]) Array.newInstance(newType.getComponentType(), newLength);
System.arraycopy(original, 0, copy, 0,
Math.min(original.length, newLength));
return copy;
}
Copy the code
Once the expansion is complete, you can happily add elements by assigning elementData[size++] directly.
In addition to the one-parameter add(E Element) method, which adds elements to the end of the array, ArrayList also supports adding elements at specified positions, add(int index, E element):
public void add(int index, E element) {
rangeCheckForAdd(index); // boundary detection
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index); // Move all elements after index
elementData[index] = element;
size++;
}
Copy the code
Inserting an element at index requires that all elements following the index be moved back to make room for the element to be added, so ArrayList inserts are not very efficient.
remove()
The remove() method also has two, the first of which removes the element at the specified position:
public E remove(int index) {
rangeCheck(index); // boundary detection
modCount++;
E oldValue = elementData(index);
int numMoved = size - index - 1;
if (numMoved > 0) // Move all elements after index
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // clear to let GC do its work
return oldValue;
}
Copy the code
The logic is simple: move all elements after index forward. Note that after the move, the end element of the collection is empty for GC collection. Like inserts, ArrayList deletions are not very efficient, with O(n) time complexity.
The second is to remove the specified element:
public boolean remove(Object o) { // If there are more than one, remove the first one
if (o == null) {
for (int index = 0; index < size; index++)
if (elementData[index] == null) {
fastRemove(index);
return true; }}else {
for (int index = 0; index < size; index++)
if (o.equals(elementData[index])) {
fastRemove(index);
return true; }}return false;
}
Copy the code
One thing to note here is that when there are duplicate elements in the collection, whether null or other objects, the remove() method only removes the first of them. Remove fastRemove(); remove fastRemove(); remove fastRemove(); remove fastRemove();
/* * Private remove method that skips bounds checking and does not return the value removed. And does not return the removed value */
private void fastRemove(int index) {
modCount++;
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // clear to let GC do its work
}
Copy the code
removeAll() && retainAll()
public boolean removeAll(Collection
c) {
Objects.requireNonNull(c);
return batchRemove(c, false);
}
Copy the code
The removeAll() method removes all elements contained in collection C by calling the batchRemove() implementation.
public boolean retainAll(Collection
c) {
Objects.requireNonNull(c);
return batchRemove(c, true);
}
Copy the code
The retainAll() method is the exact opposite of removeAll(), keeping all elements contained in collection C and removing others, which is also implemented by calling batchRemove().
The batchRemove() method is probably one of the more complex methods in ArrayList, but it’s definitely worth a closer look.
/ * * * *@paramCollection of c *@paramWhen complement is true, the value in the specified set is retained; when false, the value * in the specified set is deleted@returnAll duplicate elements in the array are removed, and whenever a deletion occurs true */ is returned
private boolean batchRemove(Collection<? > c,boolean complement) {
final Object[] elementData = this.elementData;
int r = 0, w = 0;
boolean modified = false;
try {
// Walk through the array and check if the set contains the corresponding value, move the value to the front of the array, w final value is the number of elements to keep
That is, if it is retainAll(), move the same element to the front of the array.
// If removeAll(), move the different elements to the front of the array
for (; r < size; r++)
if (c.contains(elementData[r]) == complement)
elementData[w++] = elementData[r];
} finally {
// Preserve behavioral compatibility with AbstractCollection,
// even if c.contains() throws.
if(r ! = size) {// r ! = size, indicating that an exception occurs and the loop is not completed
System.arraycopy(elementData, r,
elementData, w,
size - r); // Move the element after r over
w += size - r;
}
// w == size Indicates that all elements are retained. Modified returns false
if(w ! = size) {// clear to let GC do its work
for (int i = w; i < size; i++)
elementData[i] = null;
modCount += size - w; / / update the modCount
size = w; // w is the number of elements to save
modified = true; }}return modified;
}
Copy the code
In general, whether you removeAll() or retainAll(), I batchRemove() always move the elements to the front, remove the elements to the back, and record the number of elements to keep below.
other
The following methods are straightforward, just skim it.
// Get the collection size
public int size(a) {
return size;
}
// Check whether the set is empty
public boolean isEmpty(a) {
return size == 0;
}
// Get the element subscript
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
// Sets the element at the specified position
public E set(int index, E element) {
rangeCheck(index);
E oldValue = elementData(index);
elementData[index] = element;
return oldValue;
}
// Gets the element at the specified position
public E get(int index) {
rangeCheck(index);
return elementData(index);
}
// Clear the collection
public void clear(a) {
modCount++;
// clear to let GC do its work
for (int i = 0; i < size; i++)
elementData[i] = null;
size = 0;
}
Copy the code
conclusion
A quick summary of ArrayList:
- Based on the dynamic array implementation, the automatic expansion is 1.5 times the original
- It is contiguous in memory and has random access capability
- The time complexity of getting elements by subscript is
O(1)
- The average time complexity of adding and removing elements is
O(n)
- Allow duplicate elements, allow null values, thread unsafe
Since the title is into the JDK’s ArrayList (1), there must be two more. If you take a close look at the ArrayList source code, you’ll see a field that pops up a lot: modCount, which literally means number of changes. Almost any method that involves modifying a collection will perform modCount++, such as the clear() method:
public void clear(a) {
modCount++;
// clear to let GC do its work
for (int i = 0; i < size; i++)
elementData[i] = null;
size = 0;
}
Copy the code
So what does modCount do? That’s what ArrayList (2) in the JDK is all about.
Article first published wechat public account: Bingxin said, focus on Java, Android original knowledge sharing, LeetCode problem solving.
More JDK source code analysis, scan code to pay attention to me!