This article explores and understands thread safety from an ArrayList-related “thread safety” issue that was discovered during code review.

Case analysis

A couple of days ago, in the process of code Review, I saw a friend write something like the following:

List<String> resultList = new ArrayList<>();

paramList.parallelStream().forEach(v -> {
    String value = doSomething(v);
    resultList.add(value);
});
Copy the code

In my mind, ArrayList is not thread safe, but in this case, the same ArrayList object will be overwritten by multiple threads. I feel that there is a problem with this way of writing, so I look at the implementation of ArrayList to confirm the problem, and review the relevant knowledge.

First post a concept:

Thread safety is a term used in programming. When a function or function library is called in a multi-threaded environment, it can correctly handle the shared variables between multiple threads, so that the program functions can be completed correctly. — Wikipedia

Let’s take a look at the ArrayList source code for key information relevant to this topic:

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess.Cloneable.java.io.Serializable
{
    // ...
    
    /** * The array buffer into which the elements of the ArrayList are stored. * The capacity of the ArrayList is the length of this array buffer... * /
    transient Object[] elementData; // non-private to simplify nested class access

    /** * The size of the ArrayList (the number of elements it contains). */
    private int size;

    // ...

    /** * Appends the specified element to the end of this list... * /
    public boolean add(E e) {
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        elementData[size++] = e;
        return true;
    }

    // ...
}
Copy the code

Here are a few things to note about ArrayList:

  1. Use arrays to store data, i.eelementData
  2. Use int member variablessizeRecord the actual number of elements
  3. addMethod logic and execution order:
    • performensureCapacityInternal(size + 1): confirmelementDataWhether the capacity is enough, if not enough, expand half (apply for a new large array, willelementDataAnd then assign the new large array toelementData)
    • performelementData[size] = e;
    • performsize++

For the sake of understanding the “thread-safety issue” discussed here, let’s choose the simplest execution path. Suppose we have two threads A and B calling arrayList. add at the same time, and elementData has A capacity of 8 and A size of 7, enough to hold A new element. So what is likely to happen?

One possible order of execution is:

  • Threads A and B are executing simultaneouslyensureCapacityInternal(size + 1)as7 + 1And no more thanelementDataThe capacity is 8, so it is not expanded
  • Thread A executes firstelementData[size++] = e;At this time,sizeInto 8
  • Thread B executeselementData[size++] = e;Because theelementDataArray length 8, but accesselementData[8], array index out of bounds

The program will throw an exception and not execute properly, which is clearly a thread-unsafe situation according to the definition of thread-safety mentioned above.

Construct sample code validation

With that in mind, let’s write a simple example code to verify that the above problem can indeed occur:

List<Integer> resultList = new ArrayList<>();
List<Integer> paramList = new ArrayList<>();
int length = 10000;
for (int i = 0; i < length; i++) {
    paramList.add(i);
}
paramList.parallelStream().forEach(resultList::add);
Copy the code

It is possible to execute the above code normally, but it is more likely to encounter the following exceptions:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
	at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
	at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
	at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)
	at concurrent.ConcurrentTest.main(ConcurrentTest.java:18)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1234
	at java.util.ArrayList.add(ArrayList.java:465)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172)
Copy the code

According to my test, when the length value is small, it is difficult to reproduce because the number of times needed to expand the capacity edge is small. When the length value is large, the exception throw rate is very high.

In addition to throwing this exception, the above scenario can cause data overwrite/loss, and the actual number of elements in the ArrayList does not match the value of size.

The solution

A common and effective solution to this problem is to lock access to shared resources.

After I put forward the revision suggestion of code review, my partner will write the first code in the

List<String> resultList = new ArrayList<>();
Copy the code

Modified in order to

List<String> resultList = Collections.synchronizedList(new ArrayList<>());
Copy the code

This will eventually use SynchronizedRandomAccessList actually, look at its implementation class, actually also is locked inside, inside it holds a List, with the synchronized keyword control to the List of read and write access, this is an idea — use of thread-safe collections, We can also use Vector and other similar classes to solve the problem.

Another way to do this is to manually lock key code segments, as we can also do

resultList.add(value);
Copy the code

Modified to

synchronized (mutex) {
    resultList.add(value);
}
Copy the code

summary

Java 8 parallel flow provides a very convenient parallel processing, improve the efficiency of the program to write, we in the process of coding, on the use of multithreading should be vigilant, consciously prevent such problems.

Correspondingly, we do code review in the process, but also to involve the use of multiple threads of the scene stretched a string, in the code closed before the good, hidden hidden door.

reference

  • Thread safety – Wikipedia