preface

Not long ago, when I reviewed a problem of slow job execution with my colleagues, I found that many of my friends did not pay enough attention to details when implementing functions of the code, so I came up with this article.

ArrayList on pit

List temp = new ArrayList() ; List all = getData(); for(String str : all) { temp.add(str); }

First of all, what’s wrong with this code?

In most cases this is fine, just writing data in a loop to the ArrayList.

But in special cases, such as getData() here, where the return data is very large, the subsequent temp.add(STR) becomes problematic.

For example, when we review the code and find that the data returned can sometimes be as high as 2000W, the problem of ArrayList writing becomes obvious.

Filling holes guide

You know that an ArrayList is an array, and the data is limited in length; You need to expand the array at the right time.

So let’s say I insert it to the tail, add(E, E).

ArrayListtemp=newArrayList<>(2); temp.add(“1”); temp.add(“2”); temp.add(“3”);

When we initialize an ArrayList of length 2 and write three pieces of data into it, the ArrayList has to expand, that is, copy the old data into a new array of length 3.

It’s 3 because the new length is equal to the old length times 1.5

The default length of an ArrayList is 10.

The array DEFAULT_CAPACITY=10 is not created at initialization.

Instead, it expands to 10 when you add the first data.

Now that we know that the default length is 10, it will be expanded to 10*1.5=15 once the ninth element is written. This step is array copy, that is, to create a new memory space to hold the 15 arrays.

Once we write frequently and in large numbers, we can cause a lot of array copying, which is extremely inefficient.

But we can avoid this problem in advance if we anticipate how many pieces of data are likely to be written.

For example, if we write 1000W pieces of data into it, there is a huge difference in performance between the array size given at initialization and the default size of 10.

I validated the JMH benchmark as follows:

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)public class CollectionsTest { private static final int TEN_MILLION = 10000000; @Benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void arrayList() { List array = new ArrayList<>(); for (int i = 0; i < TEN_MILLION; i++) { array.add(“123”); } } @Benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void arrayListSize() { List array = new ArrayList<>(TEN_MILLION); for (int i = 0; i < TEN_MILLION; i++) { array.add(“123”); } } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(CollectionsTest.class.getSimpleName()) .forks(1) .build(); new Runner(opt).run(); }}

The results show that the preset length is much more efficient than using the default length (Score here refers to the time it takes to complete the function).

So it’s highly recommended that you initialize the specified length when a lot of data is written to an ArrayList.

Add (intIndex,E Element) is used carefully to write data to a specified location.

We can see from the source code, each write will be index after the data moved back again, in fact, is also to copy the array;

But instead of writing data to the end of an array, it copies the array every time, which is extremely inefficient.

LinkedList

ArrayList is the LinkedList twin; Both are containers for lists, but the essential implementations are completely different.

LinkedList is made up of linked lists. Each node has two nodes that refer to the first and last nodes respectively. So it’s also a bidirectional linked list.

So in theory it’s very efficient to write, you don’t have the very inefficient array copy of an ArrayList, you just move the pointer each time.

I’m not going to draw it because I’m lazy here, but you can imagine it.

Contrast test

Rumor has it:

LinkedList is more efficient to write than ArrayList, so it works well for LinkedList when writing more than reading.

@Benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void linkedList() { List<String> array = new LinkedList<>(); for (int i = 0; i < TEN_MILLION; i++) { array.add("123"); }}Copy the code

Here’s the test to see if it fits; Similarly, the LinkedList is written 1000W times, and the ArrayList that initialses the array length is obviously more efficient than LinkedList.

However, the premise here is to preset the array length of ArrayList in advance to avoid array expansion, so that the writing efficiency of ArrayList is very high, while the LinkedList does not need to copy memory, but needs to create objects, transform Pointers and other operations.

Without further ado, ArrayList can support subscript random access, which is very efficient.

LinkedList does not support subscript access because the underlying LinkedList is not an array. Instead, it needs to be traversed from the beginning to the end based on where the query index is located.

But either way you have to move the pointer to traverse, especially if index is near the middle, which is very slow.

conclusion

High-performance applications are built up little by little, like the pit of ArrayList mentioned here. Everyday use is not a big problem, but once the amount of data is added, all the small problems become big problems.

So to sum up:

When using ArrayList, if you can predict the size of the data in advance, you must specify its length if it is large.

If possible, avoid using the Add (index,e) API, which can lead to copying arrays and reduce efficiency.

As an additional note, HashMap, another commonly used Map container, is also recommended to initialize the length to avoid expansion.

PDF covers JVM, locking, high concurrency, reflection, Spring principles, microservices, Zookeeper, databases, data structures, and more. Shimo. Im/docs/pVhDCp… Click to get it.