ArrayList dissects the underlying principles from a source code perspective

This article has been put into the Github github.com/sh-blog warehouse, where all my articles have been classified for easier reading. At the same time, some job information will be released, and continue to update, welcome Star

For ArrayList, we usually use the most methods should be add and remove, this article mainly through these two basic methods, through the source code to see the underlying principle of ArrayList.

add

Add elements by default

This is probably the most commonly used method and is used as follows.

Let’s take a look at the underlying source code for the Add method.

EnsureCapacityInternal is used to make sure that the array does not cross bounds while constantly inserting data into the ArrayList and automatically expands.

The value of minCapacaity is actually the number of elements in the array after the current add operation is called

Notice that the current add operation is complete. For example, if the ArrayList had three elements before add was called, the value of minCapacity would be 4

In addition, you can see that the return value of function calculateCapacity is used as input to ensureExplicitCapacity.

What the calculateCapacity does, in plain English, is return the default array length (10) and the maximum minCapacity if the current array is empty, or minCapacity if not.

EnsureExplicitCapacity ensureExplicitCapacity

ModCount indicates how many times the ArrayList has been changed, where a change is not just a new addition, but a deletion is also a change. If the number of elements in the current array is less than the length of the array, the value of minCapacity must be less than elementData.length.

On the other hand, if the number of elements in the array is already equal to the length of the array, 0>0 must be false, and the function grow is called to expand the array.

The core logic is simple: the new array length = the old array length + the old array length /2.

This shift right here, this is the same thing as dividing by 2

So using the source code here we verify that ArrayList is multiplied by a factor of 1.5. As mentioned above, the value of minCapacity and the length of the array should be equal, so the length of the new array – minCapacity should always be greater than 0. Why is minCapacity less than 0?

The answer is addAll.

As you can see, both Add and addAll are calling EnrecapityInternal underneath. Add adds a single element to an array, while addAll adds the entire array. Let’s say that addAll we pass in a very large value, so for example, the default array size of an ArrayList is 10, and then after one enlargement it’s 15, and let’s say that we pass in 10 elements, then even one enlargement is not enough to fit all of them, because 15 is less than 20.

Therefore, newCapacity (the size of the array after the expansion) is less than minCapacity (the number of elements in the array after the current operation). When this happens, the value of minCapacity is directly assigned to newCapacity.

What if addAll inserts an Integer.MAX_VALUE into it? This is the logic that the hugeCapacity function handles. If an overflow occurs, an OOM exception is raised, and integer.max_value is not exceeded.

Finally, we actually perform the capacity expansion by calling the array.copyof method from the java.util package. As you can see from the figure above, this method takes two parameters: the array to hold the elements and the new array length. For example:

int[] elementData = {1.2.3.4.5}; 
int[] newElementData = Arrays.copyOf(elementData, 10);
System.out.println(newElementData); // [1 2 3 4 5 0 0 0 0 0 0]
Copy the code

ElementData [size++] = e. ElementData [size++] = e.

Let’s use a flowchart to summarize the core flow of the ADD operation.

Specifies where to add elements

Now that we know about add and addAll, let’s strike while the iron is hot. Add, which can specify the location of an element, takes two arguments:

The index of the new element in the array
The new element itself

Add (int index, E Element) inserts the element at the specified location in the array. Let’s look at the source code.

First of all, as a result of this method allows the user to the incoming array subscript, so the first thing to do is to check the incoming array subscript is legal, if not rule will directly sell IndexOutOfBoundsException anomalies.

The index cannot be less than 0 and cannot exceed the number of elements in the array.

Like above this kind of situation, call the add (int index, E element), before an array with three elements, even if the underlying the length of the array is 10, but if the array subscript to 5, is thrown IndexOutOfBoundsException abnormal. In this case, index must be at most 3, because index(subscript 3) > size(three elements) must not be true.

Once we’re done, we’ll call the ensureCapacityInternal method that we talked about above to expand the underlying array as needed. Then I call the System. arrayCopy method, which is a bit more critical and difficult to understand, so I’ll summarize in one sentence what it does – move all the subscripts one bit back.

System. Arraycopy accepts four parameters:

The original array
The starting index in the original array
The target array
The starting index in the target array

It’s kind of hard to understand just by looking at the parameters without using an example, but LET me do a simple example here.

Let’s say I have elements [1, 2, 3] in the array, and I call add(1, 4), indicating that I want to insert element 4 into the array at index 1, then index is 1 and size is 3.

The System. Arraycopy method becomes System. Arraycopy (elementData, 1, elementData, 2, 2), which is the two elements of elementData starting with subscript 1 (2 and 3). Copy to elementData from subscript 2.

After the implementation, elementData becomes [1, 2, 2, 3], and then we assign the element subscript 1 to the element that we need to add this time. One more sentence and it becomes [1, 4, 2, 3].

So there’s no dark magic, and the main thing to understand are two key functions: Array. copy and System. arrayCopy. We need to remember what these two encapsulated functions do.

Arrays. Copy Array expansion
Arraycopy moves all elements in an array one bit back after a subscript

So from here you can see the benefits of looking at the source code, there are two main aspects. Arraycopy is a function that you might not know you can use, even if you don’t understand the parameters. Second, you know system.arrayCopy, but feel that these functions have no application scenarios at all.

remove

To understand where data comes from, let’s take a look at how data is removed. First, let’s look at the two most commonly used:

Remove by subscript
Remove by element

Remove by subscript

The first is to remove by subscript

There is also a check to see if the passed index is valid, but there is a slight difference between the index and the rangeCheck call in add. RangeCheckForAdd in add determines if index is negative, while rangeCheck in Remove only determines if index >= the number of elements in the array.

As you can see from the name of the function, rangeCheckForAdd is specifically for the add method

What if the index passed in is actually negative? Is actually throws an ArrayIndexOutOfBoundsException because the remove method upon the Range note.

Once done, the modCount value is updated, which verifies that the modCount changes we mentioned above also include deletions.

A numMoved is then calculated to represent the number of elements that need to be moved. If you add an element, all elements in the subscript must be moved backwards one bit. If you remove an element, all elements in the subscript must be moved backwards one bit, like this:

Once we know how many elements we need to move, our System.ArrayCopy is ready to go again. After the element is moved, an element must be left empty at the end of the array. Set it to NULL and pass it to the GC for collection. The removed value is returned.

Remove by value

For example, removal by value looks something like this.

Nonsense not to say, look directly at the core source code

What the hell is removing a null? Have to cycle to remove it?

In fact, ArrayList allows us to pass in null values, for example.

After looking at this example, you should be able to see why o == null is used. If null is passed in, the ArrayList iterates through the underlying array and removes the element whose first value is NULL.

The same is true if the value is not null. If there are multiple identical values in the array, ArrayList will iterate over them and remove the first matched value. As you can see from the source code, whether or not the value is null, it calls the real remove element method fastRemove, which does almost the same thing as remove.

The only difference is that removing by subscript returns the removed element; Removing by value only returns whether the removal was successful.

conclusion

So, after reading part of the source code for ArrayList, we can see that the underlying data structure of ArrayList is an array. Although the ArrayList is a dynamic array to the user, the underlying array is in fact a fixed-length array, and the underlying array is expanded by a factor of 1.5 if necessary. However, the source code also shows that expansion and deletion are costly, especially in extreme cases where a large number of elements will need to be shifted.

Therefore, we come to the conclusion that the frequent random insertion and frequent deletion of ArrayList will greatly affect its performance. In conclusion, ArrayList is suitable for scenarios with more reads and less writes.

Another very, very important point, just to mention, is that An ArrayList is not thread-safe. Multithreaded situations appear inconsistent data or will throw ConcurrentModificationException, have a chance to talk about this behind

Once you know how to access and remove data from a data structure, you can easily understand a lot about it.

For example, the indexOf method is used to return the indexOf a specified element in an array. Knowing about traversal matches in remove, or even intuitively, it should be obvious that indexOf is not a for loop match. Isn’t lastIndexOf just a reverse for loop match? So Posting the source code here makes no sense except to make the article longer. If you are interested in this, you can find the source code to have a look.

This post has been posted on my Github github.com/sh-blog. Welcome Star. Wechat search follow [SH full stack notes], reply [queue] to get MQ learning materials, including basic concept analysis and RocketMQ detailed source code analysis, continue to update.

If you find this article helpful, please give it a thumbs up, a comment, a share and a comment.

ArrayList dissects the underlying principles from a source code perspective

add

Add elements by default

Specifies where to add elements

remove

Remove by subscript

Remove by value

conclusion

Related Posts

Spring Cloud Gateway five Sentinel integration

Force buckle 55 – jump game

Explain annotations in Java