preface
This feel CopyOnWriteArrayList is too simple, think to see the name can know the internal implementation logic, so no idea to write this article, and recently carefully looked at the source code of CopyOnWriteArrayList, no accident in general logic, but still found a lot of interesting places, Leave this article to share.
After reading this article, you will learn:
- CopyOnWriteArrayList implementation principle, capacity expansion mechanism.
- CopyOnWriteArrayList read/write separation, weak consistency.
- How does CopyOnWriteArrayList perform?
- Doug Lea has a good reason why CopyOnWriteArrayList should reassign the same value when modifying an element.
- CopyOnWriteArrayList is implemented differently in older JDK versions and why.
Thread-safe List
In Java, there is more than one thread-safe List. In addition to today’s hero CopyOnWriteArrayList, there are also Vector and SynchronizedList classes, which are thread-safe lists. Before I introduce CopyOnWriteArrayList, let me briefly introduce the other two.
If you look at their source code, you will notice something is wrong. Concurrent collections are in the java.util. Concurrent package. Why are Vector and SynchronizedList in the java.util package?
Thread-safe HashTable is the same as thread-safe HashTable, which is implemented in a simple way. Synchronized keyword is added to the List directly, and no matter how many methods are added, even if the get method is not an exception. That’s how rough it is.
The get method on Vector:
Synchronized public synchronized E get(int index) {if (index >= elementCount) throw new ArrayIndexOutOfBoundsException(index); return elementData(index); }Copy the code
SynchronizedList get method:
public E get(int index) {
synchronized (mutex) {return list.get(index);}
}
Copy the code
In fact, there is a reason to add synchronization mechanism to get method. Although it reduces efficiency, the data written can be queried immediately, which also ensures the strong consistency of data. Synchronized’s simplistic description of synchronized is also inaccurate, because in older JDK versions, synchronized already automatically adjusts lock granularity based on runtime conditions, as described later in CopyOnWriteArrayList.
CopyOnWriteArrayList
There is currently only one concurrent collection of lists in the JDK and a package, CopyOnWriteArrayList. The crude implementation of Vector and SynchronizdList is described above. CopyOnWriteArrayList must be different as a unique concurrent List.
Before we explore the implementation of CopyOnWriteArrayList, let’s think about how you would implement a thread-safe List.
- How do you keep threads safe when reading and writing concurrently?
- Does the data need to be consistent? Are data read and write updates displayed immediately?
- What is the capacity for initialization and expansion?
- Do you want to ensure consistency of data during traversal? Need to introduce fail-fast?
The name of the CopyOnWriteArrayList class gives us a pretty good idea of how to implement it: copy-on-write, a copy-on-write policy; The ArrayList at the end indicates that the data is stored in an array. To add, delete, or modify an element, copy the existing data array first, and then add, delete, or modify the data array on the copy array. After the operation is complete, the original data array is replaced with the new array. This completes the update operation.
But there is always a problem with this copy-on-write approach, because each update replaces the old array with a new one, and if it happens that a thread is reading at the time of the update, the old array is being read. In fact, this is also the idea of read-write separation, giving up strong data consistency in exchange for improved performance.
Analyzing source code (JDK8)
As mentioned above, the idea of CopyOnWriteArrayList is copy-on-write, read-write separation, and it maintains a volatile array inside that holds element data.
/** The array, accessed only via getArray/setArray. */
private transient volatile Object[] array;
Copy the code
CopyOnWriteArrayList class method many, here will not be introduced one by one, the following will analyze several commonly used methods, these methods to understand the basic can master CopyOnWriteArrayList implementation principle.
The constructor
There are three constructors for CopyOnWriteArrayList. One is a no-argument constructor that directly initializes the array with length 0. The other two pass in a collection or array as an argument, and then extract the elements of the collection or array directly and assign values to the array that CopyOnWriteArrayList maintains internally.
Public CopyOnWriteArrayList() {setArray(new Object[0]); } public CopyOnWriteArrayList(Collection<? extends E> c) { Object[] es; if (c.getClass() == CopyOnWriteArrayList.class) es = ((CopyOnWriteArrayList<? >)c).getArray(); else { es = c.toArray(); if (c.getClass() ! = java.util.ArrayList.class) es = Arrays.copyOf(es, es.length, Object[].class); } setArray(es); } // Pass in an array, Public CopyOnWriteArrayList(E[] toCopyIn) {setArray(Array.copyof (toCopyIn, toCopyIn.length, Object[].class)); }Copy the code
Constructors are called at instance creation time and are not thread-safe, so constructors are simple assignment operations with no special logic.
The new element
There are several element additions, depending on the input parameter, but the principle is the same, so the implementation of add(E E) is only posted below, which is thread-safe with a ReentrantLock.
/** * Appends the specified element to the end of this list. * * @param e element to be appended to this list * @return {@code true} (as specified by {@link Collection#add}) */ public boolean add(E e) { final ReentrantLock lock = this.lock; lock.lock(); // try {Object[] elements = getArray(); Int len = elements. Length; Object[] newElements = Arrays.copyOf(elements, len + 1); NewElements [len] = e; // add a new element setArray(newElements); // Replace the old array with the new array return true; } finally { lock.unlock(); }}Copy the code
Specific steps:
- Lock the current array of data to start operations (lock to ensure that only one thread at a time to add/delete/modify operations).
- Copy the current data array and increase its length by one.
- Put a new element in a new array.
- Replace the old array with the new one.
- Finally Releases the lock.
Because the capacity of each add only increases by 1, new arrays must be created for data replication each time. After the operation is complete, the old data is replaced, which inevitably degrades the performance of data creation. Here is a simple example to test the new and query performance of CopyOnWriteArrayList, Vector, and ArrayList.
public static void main(String[] args) { CopyOnWriteArrayList<Object> copyOnWriteArrayList = new CopyOnWriteArrayList<>(); Vector vector = new Vector<>(); ArrayList arrayList = new ArrayList(); add(copyOnWriteArrayList); add(vector); add(arrayList); get(copyOnWriteArrayList); get(vector); get(arrayList); } public static void add(List list) { long start = System.currentTimeMillis(); for (int i = 0; i < 100000; i++) { list.add(i); } long end = System.currentTimeMillis(); System.out.println(list.getClass().getName() + ".size=" + list.size() + ",add time :" + (end-start) + "ms"); } public static void get(List list) { long start = System.currentTimeMillis(); for (int i = 0; i < list.size(); i++) { Object object = list.get(i); } long end = System.currentTimeMillis(); System.out.println(list.getClass().getName() + ".size=" + list.size() + ",get :" + (end-start) + "ms"); }Copy the code
From the results we can see that CopyOnWriteArrayList takes the most time to add, followed by the locked Vector (Vector’s capacity is doubled by default). The fastest to fetch is a thread-unsafe ArrayList, followed by CopyOnWriteArrayList, and Vector has the lowest performance because it is locked at Get.
Java. Util. Concurrent. CopyOnWriteArrayList. Size = 100000, add time: 2756 ms Java. Util. Vector. Size = 100000, add time: 4 ms Java. Util. ArrayList. Size = 100000, add time: 3 ms Java. Util. Concurrent. CopyOnWriteArrayList. Size = 100000, get time: 4 ms Java.util.vector. size=100000,get time :5ms Java.util.arrayList. Size =100000,get time :2msCopy the code
Modify the element
The idea of modifying elements and adding elements is the same, through ReentrantLock to ensure thread safety, implementation code is relatively simple, originally not ready to write in, but when looking at the source code found a very interesting place, look at the following code.
public E set(int index, E element) { final ReentrantLock lock = this.lock; lock.lock(); // try {Object[] elements = getArray(); E oldValue = get(elements, index); If (oldValue! = element) {// New elements are equal, not equal int len = elements. Object[] newElements = Arrays.copyOf(elements, len); NewElements [index] = element; // Assign a new value setArray(newElements); } else {// Not quite a no-op; ensures volatile write semantics setArray(elements); } return oldValue; } finally { lock.unlock(); }}Copy the code
SetArray (elements); setArray(elements); Now, that’s interesting. Why is that? To understand why, consider the special use of volatile, as illustrated by the following code example.
// initial conditions int nonVolatileField = 0; CopyOnWriteArrayList<String> list = /* a single String */ // Thread 1 nonVolatileField = 1; // (1) list.set(0, "x"); // (2) // Thread 2 String s = list.get(0); // (3) if (s == "x") { int localVar = nonVolatileField; / / (4)} / / examples from: https://stackoverflow.com/questions/28772539/why-setarray-method-call-required-in-copyonwritearraylistCopy the code
To understand what is special about this example, you need to know that volatile prevents instruction reordering, and that happens-before mechanisms are important. Simply put, they ensure that code is executed sequentially.
For example, in the example above, 1 will execute before 2 and 3 before 4, without question. Another is that volatile writes are performed before reads, so 2 is performed before 3. There is also transitivity to the execution order. So eventually 1 will be executed before 4. In this case, the value obtained by 4 is the value assigned to nonVolatileField in Step 1. None of this would exist if the set method in CopyOnWriteArrayList did not setArray for the same value.
Remove elements
Public E remove(int index) public E remove(int index)
public E remove(int index) { final ReentrantLock lock = this.lock; lock.lock(); // try {Object[] elements = getArray(); Int len = elements. Length; E oldValue = get(elements, index); Int numMoved = len-index - 1; If (numMoved == 0) // Whether the end of setArray(Arrays. CopyOf (Elements, Len-1)); Else {Object[] newElements = new Object[len-1]; Arraycopy (elements, 0, newElements, 0, index); arrayCopy (elements, 0, newElements, 0, index); System.arraycopy(elements, index + 1, newElements, index, numMoved); setArray(newElements); } return oldValue; } finally { lock.unlock(); // unlock}}Copy the code
The code is simple: use the ReentrantLock exclusive lock to keep the operation thread-safe, copy the remaining array elements into the new array, replace the old array with the new array, and release the lock to return.
Access to elements
Get subscript as the index of the element, if the element does not exist, throws IndexOutOfBoundsException anomalies.
public E get(int index) {
return get(getArray(), index);
}
final Object[] getArray() {
return array;
}
private E get(Object[] a, int index) {
return (E) a[index];
}
Copy the code
There is no lock operation, and there are two steps to get the element at the specified position:
- GetArray () gets the data array.
- Get (Object[] a, int index) returns the element at the specified position.
It is likely that some thread updated the array after the first step and before the second. According to the above analysis, we know that the update will generate a new array, but we have obtained the old array in the first step, so we still carry out the get on the old array, that is to say, the update result of another thread does not take effect on our current GET. This is also the problem of weak consistency mentioned above.
Weak consistency of iterators
List<String> list = new CopyOnWriteArrayList<>(); list.add("www.wdbyte.com"); List.add (" unread code "); Iterator<String> iterator = list.iterator(); list.add("java"); while (iterator.hasNext()) { String next = iterator.next(); System.out.println(next); }Copy the code
We now add the element www.wdbyte.com and unread code to the List. After we get the iterator object, we add the new element Java. We can see that the iterator results in no error and no Java output. That is, the update of the element is not visible once the iterator object is in hand.
Why is that? Start with the implementation of the iterator() method of CopyOnWriteArrayList.
public Iterator<E> iterator() { return new COWIterator<E>(getArray(), 0); } static final class COWIterator<E> implements ListIterator<E> { /** Snapshot of the array */ private final Object[] snapshot; /** Index of element to be returned by subsequent call to next. */ private int cursor; private COWIterator(Object[] elements, int initialCursor) { cursor = initialCursor; snapshot = elements; }...Copy the code
GetArray () takes the array and passes it to the COWIterator constructor. It then assigns the snapshot property to the COWIterator constructor. The old array is still used, so the update operation is not visible, which is the weak consistency mentioned many times above.
The new changes
The above source code analysis is based on JDK 8. When I was writing this article, I looked at the implementation of the new version. There were really big changes, mainly reflected in the way of locking, perhaps because JVM later introduced the synchronized lock upgrade strategy, which improved synchronized performance a lot. The old ReentrantLock was replaced with a synchronized lock.
Feature:
public boolean add(E e) { synchronized (lock) { Object[] es = getArray(); int len = es.length; es = Arrays.copyOf(es, len + 1); es[len] = e; setArray(es); return true; }}Copy the code
Modification:
public E set(int index, E element) {
synchronized (lock) {
Object[] es = getArray();
E oldValue = elementAt(es, index);
if (oldValue != element) {
es = es.clone();
es[index] = element;
}
// Ensure volatile write semantics even when oldvalue == element
setArray(es);
return oldValue;
}
}
Copy the code
conclusion
From the above analysis, the following points are summarized about CopyOnWriteArrayList.
- CopyOnWriteArrayList uses read-write separation and copy-on-write mode to achieve thread safety and weak consistency.
- CopyOnWriteArrayList has poor write performance because the copy array needs to be expanded each time it is written.
- CopyOnWriteArrayList reassigns elements when modifying them, even if nothing has changed, to ensure volatile semantics.
- In older JDKS, CopyOnWriteArrayList uses synchronized, thanks to the synchronized lock upgrade strategy.
Reference:
- Why setArray() method call required in CopyOnWriteArrayList.
Stackoverflow.com/questions/2…
- What does volatile do?
www.cs.umd.edu/~pugh/java/…
Author: Unread code Link: juejin.cn/post/688512… The copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.