Thread-safe List (a) : CopyOnWriteArrayList source code parsing

preface

List interface is very common in Java, common List interface implementation classes such as ArrayList, LinkedList, they have a wide range of applications. However, both lists are thread-unsafe. This article will introduce a thread-safe ArrayList: CopyOnWriteArrayList, which will be analyzed and explained in depth. Initialize, expand, add, delete, change, and check the basic operation and principle of ArrayList.

Basic structure and initialization

The basic structure

public class CopyOnWriteArrayList<E>
    implements List<E>, RandomAccess.Cloneable.java.io.Serializable {
    private static final long serialVersionUID = 8673264195747942595L;

    /** * The lock protecting all mutators. (We have a mild preference * for builtin monitors over ReentrantLock when either  will do.) */
     // Writers' idiosyncrasies
    final transient Object lock = new Object(a);// An array of data
    private transient volatile Object[] array;
}
Copy the code

First look at the basic structure of CopyOnWriteArrayList, as shown above. An Object lock is used as a Synchronized lock, and an Object array is used to store data, named array. It is volatile and ensures visibility to different threads.

As noted in the comments, the authors of the source code prefer Synchronized locks when both Java’s built-in Synchronized locks and ReentrantLocks are available.

Initialize the

// No arguments
public CopyOnWriteArrayList() {
    setArray(new Object[0]);
}
// Use set to construct
public CopyOnWriteArrayList(Collection<? extends E> c) {
    Object[] es;
    if(c.getClass() == CopyOnWriteArrayList.class) es = ((CopyOnWriteArrayList<? >)c).getArray();else {
        es = c.toArray();
        if(c.getClass() ! = java.util.ArrayList.class) es = Arrays.copyOf(es, es.length,Object[].class);
    }
    setArray(es);
}
// Use an array
public CopyOnWriteArrayList(E[] toCopyIn) {
    setArray(Arrays.copyOf(toCopyIn, toCopyIn.length, Object[].class));
}
/ / setArray method
final void setArray(Object[] a) {
    array = a;
}
/ / getArray method
final Object[] getArray() {
    return array;
}
Copy the code

Three initialization methods:

The one without parameters is the simplest, needless to say;
Construct with collections: If the collection is also CopyOnWriteArrayList, then simply make es point to the array that the other array points to, otherwise, run ArrayList if it ises = c.toArray();If not ArrayList, executees = Arrays.copyOf(es, es.length, Object[].class);Somebody asked, this array is es, and then callArrays.copyOf(es, es.length, Object[].class)What’s the point? This three-parameter version of Array. copyOf just gets a new array of es elements with es. Length, but Object type, so it still makes sense. For some collections, the toArray method may return an array of type other than Object. I also talked about this in the ArrayList blog.
The three-parameter arrays. copyOf method is already covered and not explained.

The write operation

So let’s look at write operations first, so in CopyOnWriteArrayList, if you make any changes to the original array, it’s a write operation, it’s a lock, even if it’s just a set method. (Recall that ArrayList has a modCount property that indicates if the array has changed structurally, but ArrayList’s set method does not change modCount, so CopyOnWriteArrayList is very strict for photo.)

Set method

public E set(int index, E element) {
    / / lock
    synchronized (lock) {
        Object[] es = getArray();
        E oldValue = elementAt(es, index);
        // It does need to be modified
        if(oldValue ! = element) {// Make a copy
            es = es.clone();
            / / modify
            es[index] = element;
        }
        setArray(es);
        returnoldValue; }}Copy the code

As you can see, a set needs to grab a lock. After all, it is a thread-safe collection, which MAKES sense to me. If you do need to modify, then es = es.clone(); Make es point to the cloned array. The name is es, but now there are two sets of numbers. Finally, run setArray(es); Make array point to the new array.

It is not hard to see that just a simple set operation, but to copy the entire array, if the array is large, the efficiency will be less.

The add operation

Add an element

There are several versions of this method, but let’s start with the simplest two:

public boolean add(E e) {
    synchronized (lock) {
        Object[] es = getArray();
        int len = es.length;
        // Clone a new array of length + 1
        es = Arrays.copyOf(es, len + 1);
        // End assignment
        es[len] = e;
        setArray(es);
        return true;
    }
}  

public void add(int index, E element) {
    synchronized (lock) {
        Object[] es = getArray();
        int len = es.length;
        if (index > len || index < 0)
            throw new IndexOutOfBoundsException(outOfBounds(index, len));
        Object[] newElements;
        // The number of elements to move
        int numMoved = len - index;
        // Do not move, direct tail assignment
        if (numMoved == 0)
            newElements = Arrays.copyOf(es, len + 1);
        // Otherwise, move
        else {
            newElements = new Object[len + 1];
            System.arraycopy(es, 0, newElements, 0, index);
            System.arraycopy(es, index, newElements, index + 1,
                             numMoved);
        }
        / / assignmentnewElements[index] = element; setArray(newElements); }}Copy the code

The first add method appends an element to the end, cloning an array of length + 1 and assigning it. The second add method is similar to the add method of an ArrayList, which looks at how many elements need to be moved back, moves them accordingly, and assigns values at index. But there are differences:

System.arraycopy(elementData, index,
                 elementData, index + 1,
                 s - index);
elementData[index] = element;
Copy the code

An ArrayList just moves the elements after index back, not before. CopyOnWriteArrayList does all of that, which makes sense, because CopyOnWriteArrayList declares a new array, and of course everything gets copied.

You can also see that CopyOnWriteArrayList doesn’t need to be expanded. Why? ArrayList needs to be expanded because, at least until the next expansion, the add operation is still assigned to the elemantData array. It does not create a new array every time it is added, so it can expand the capacity a bit more and avoid frequent expansion and linear time complexity of copying elements. CopyOnWriteArrayList copies a new array every time you add it, so why expand it? Even if you expand it, you’ll copy a new array next time. So CopyOnWriteArrayList makes a new array of length plus 1 every time I add it.

And then addIfAbsent, which is a little harder:

public boolean addIfAbsent(E e) {
    Object[] snapshot = getArray();
    // use the logic expression short circuit property
    return indexOfRange(e, snapshot, 0, snapshot.length) < 0
        && addIfAbsent(e, snapshot);
}
private boolean addIfAbsent(E e, Object[] snapshot) {
    synchronized (lock) {
        // Actually add elements, a bunch of code, and look at it later}}Copy the code

IndexOfRange (e, snapshot, 0, snapshot.length);

private static int indexOfRange(Object o, Object[] es, int from, int to) {
    if (o == null) {
        for (int i = from; i < to; i++)
            if (es[i] == null)
                return i;
    } else {
        for (int i = from; i < to; i++)
            if (o.equals(es[i]))
                return i;
    }
    return -1;
}
Copy the code

There’s nothing to explain. Look for elements from beginning to end!

AddIfAbsent (e, snapshot, 0, snapshot. Length) < 0 && addIfAbsent(e, snapshot); When the front part is true, that is, there is no element, the real Add will be called further, otherwise it won’t go down at all. This kind of short-circuiting is all too common in Java source code… Let’s look at the real add:

private boolean addIfAbsent(E e, Object[] snapshot) {
    synchronized (lock) {
        // The array to which the new array points
        Object[] current = getArray();
        int len = current.length;
        // The array has changed
        if(snapshot ! = current) { int common =Math.min(snapshot.length, len);
            for (int i = 0; i < common; i++)
                if(current[i] ! = snapshot[i] && Objects.equals(e, current[i]))// This element appears in the first half of current, return false
                    return false;
            The element // // appears in the second half of current, return false
            if (indexOfRange(e, current, common, len) >= 0)
                    return false;
        }
        Object[] newElements = Arrays.copyOf(current, len + 1);
        newElements[len] = e;
        setArray(newElements);
        return true; }}Copy the code

Since this element is not in the array, we can just append it to the end. Why do we need all this code? Object[] snapshot = getArray(); But this is only a snapshot, the content of this snapshot is determined at the moment of generation, will not change. As we know from the above analysis of some writes, CopyOnWriteArrayList changes the array reference whenever there are writes. So maybe you took a little too long to do an indexOfRange on snapshot. Although you didn’t find the element in snopShot, maybe during that time some other write changed the array reference of CopyOnWriteArrayList. When you’re done looking for snopshot and the element doesn’t exist, some other operation causes the element to be in the reference array of CopyOnWriteArrayList. Synchronized (lock) {synchronized (lock) {synchronized (lock) {synchronized (lock) {synchronized (lock) {synchronized (lock) {

That explains why we have all this code, so let’s take a closer look at this code. Get current snapshot of CopyOnWriteArrayList and len of its length. Then determine if (snapshot! = current), if the two snapshots are the same, then great, don’t worry about what I said above, just append elements!

Otherwise, it goes into the if block. int common = Math.min(snapshot.length, len); For (int I = 0; i < common; I++) loops through the search. The code looked up here is also interesting:

if(current[i] ! = snapshot[i] && Objects.equals(e, current[i]))// This element appears in the first half of current, return false
    return false;  
Copy the code

This is a short circuit operation again. The first part of the logical expression: current[I]! = snapshot[I]; Current [I] == snapshot[I]; current[I] == snapshot[I]; Only when current[I]! Current [I] = true; objects.equals (e, current[I]); Return false (T_T). The == operation is faster than objects.equals ().

If (indexOfRange(e, current, common, len) >= 0) Int common = math.min (snapshot.length, len); There are so many elements, maybe not all of them.

After the comparison, if the element is still missing in the new snapshot, append it!

No other write will change the array reference during the current snapshot. The answer is no, notice that the private Boolean addIfAbsent function is essentially a lock grabber, so if you get the current snapshot, no other thread can write to change the array reference until you leave the code.

Add a bunch of elements

Let’s start with the simple version:

public boolean addAll(Collection<? extends E> c) {
    // Get the corresponding elements
    Object[] cs = (c.getClass() == CopyOnWriteArrayList.class) ? ((CopyOnWriteArrayList<? >)c).getArray() : c.toArray();// Nothing to add
    if (cs.length == 0)
        return false;
    synchronized (lock) {
        Object[] es = getArray();
        int len = es.length;
        Object[] newElements;
        // If the array length is 0 and the collection is ArrayList or CopyOnWriteArrayList, assign the value directly
        if (len == 0 && (c.getClass() == CopyOnWriteArrayList.class ||
                         c.getClass() == ArrayList.class)) {
            newElements = cs;
        // Otherwise, append these elements to the tail
        } else {
            newElements = Arrays.copyOf(es, len + cs.length);
            System.arraycopy(cs, 0, newElements, len, cs.length);
        }
        setArray(newElements);
        return true; }}Copy the code

This code is also fairly simple: after some judgment (for example, if the set to be added is empty, return false), append the element to the end of the array. Public Boolean addAll(int index, Collection
c) extends E> c) extends E> c) extends E> c) extends E> c) extends E> c) extends E> c) extends E> c) extends E> c) extends E> c) extends E> c Look at this one, which is a little harder:

public int addAllAbsent(Collection<? extends E> c) {
    Object[] cs = c.toArray();
    if(c.getClass() ! = ArrayList.class) { cs = cs.clone(); }// Set is empty, no need to add, return directly
    if (cs.length == 0)
        return 0;
    synchronized (lock) {
        // A bunch of complex code, explained later}}Copy the code

This function adds elements to the array if there are no elements in the array. And then we get an array of the elements of that set. This one has this judgment:

if(c.getClass() ! = ArrayList.class) { cs = cs.clone(); }Copy the code

The toArray of some collections returns its own array, for fear of contaminate the original array of collections. Then, if the set is empty, it simply returns, and then looks at the code in the synchronized block, which actually adds:

synchronized (lock) {
    // Get your own snapshot, after which no thread will change the snapshot
    Object[] es = getArray();
    int len = es.length;
    int added = 0;
    // Add
    for (int i = 0; i < cs.length; ++i) {
        Object e = cs[i];
        // e cannot appear in CopyOnWriteArrayList
        // Also cannot add duplicate e
        if (indexOfRange(e, es, 0, len) < 0 &&
            indexOfRange(e, cs, 0, added) < 0)
            cs[added++] = e;
    }
    // There are qualified elements to add
    if (added > 0) {
        // append the tail
        Object[] newElements = Arrays.copyOf(es, len + added);
        System.arraycopy(cs, 0, newElements, len, added);
        setArray(newElements);
    }
    return added;
}
Copy the code

Object e = cs[I]; And then we have an if, the first condition indexOfRange(e, es, 0, len) < 0 and it makes sense that this element, e, really doesn’t exist in our array, but what about the second condition? IndexOfRange (e, cs, 0, Added) < 0, note that added is initialized to 0, so if the first condition does not exist in CopyOnWriteArrayList, the second condition must be true, and added becomes 1. If a second e appears that satisfies the first condition, go through the second condition, meaning that not only can this E not appear in CopyOnWriteArrayList, but also cannot appear in cs[0:added], that is, cannot repeatedly add elements. This is also consistent with the semantics of addAllAbsent, so if you add the same element twice, it might conflict a little bit with this Absent.

Finally, the element cs[0:added] is added to the tail.

The remove operation

Remove an element

The easiest way to delete is by subscript:

public E remove(int index) {
    synchronized (lock) {
        Object[] es = getArray();
        int len = es.length;
        E oldValue = elementAt(es, index);
        int numMoved = len - index - 1;
        Object[] newElements;
        // Delete the last element
        if (numMoved == 0)
            newElements = Arrays.copyOf(es, len - 1);
        // Otherwise, copy the elements and move
        else {
            newElements = new Object[len - 1];
            System.arraycopy(es, 0, newElements, 0, index);
            System.arraycopy(es, index + 1, newElements, index,
                             numMoved);
        }
        setArray(newElements);
        returnoldValue; }}Copy the code

Int numMoved = len-index – 1; NewElements = array.copyof (es, len-1); newElements = array.copyof (es, len-1); Otherwise, copy the element at 0-index to newElements, and move the element after index forward to newElements.

Delete by element:

// Check if the element is available
public boolean remove(Object o) {
    Object[] snapshot = getArray();
    int index = indexOfRange(o, snapshot, 0, snapshot.length);
    // Remove from snapshot
    return index >= 0 && remove(o, snapshot, index);
}
private boolean remove(Object o, Object[] snapshot, int index) {
    synchronized (lock) {
        // A bunch of operations, more on that later}}Copy the code

Return index >= 0 && remove(o, snapshot, index); , still short circuit operation. Now look at the synchronized code block:

synchronized (lock) {
    // Get the current snapshot
    Object[] current = getArray();
    int len = current.length;
    if(snapshot ! = current) findIndex: { int prefix =Math.min(index, len);
        // The first half
        for (int i = 0; i < prefix; i++) {
            if(current[i] ! = snapshot[i] && Objects.equals(o, current[i])) { index = i;breakfindIndex; }}// Some boundary judgments
        if (index >= len)
            return false;
        if (current[index] == o)
            break findIndex;
        // Find the second part (if any)
        index = indexOfRange(o, current, index, len);
        if (index < 0)
            return false;
    }
    // The general operation to delete the element at index
    Object[] newElements = new Object[len - 1];
    System.arraycopy(current, 0, newElements, 0, index);
    System.arraycopy(current, index + 1,
                     newElements, index,
                     len - index - 1);
    setArray(newElements);
    return true;
}
Copy the code

if (snapshot ! = current) if it is false, delete it directly.

Otherwise, you have to go through the if code. Int prefix = math.min (index, len); If (current[I]! = if (current[I]! = if (current[I]! = if (current[I]! = if (current[I]! = snapshot[I] && objects.equals (o, current[I])) { Only the current [I]! = snapshot[I] = objects.equals (o, current[I]). If I is found that satisfies objects.equals (o, current[I]), assign it to index, and break pops out;

If index >= len is not found, return false; if index >= len is not found, return false. If (current[index] == o); if (current[index] == o); Index = indexOfRange(o, current, len); If (index < 0), return false.

If you find the element, run the code that deletes the element at index.

Remove a bunch of elements

Let’s start with the simple version:

void removeRange(int fromIndex, int toIndex) {
    synchronized (lock) {
        Object[] es = getArray();
        int len = es.length;
        // subscript check
        if (fromIndex < 0 || toIndex > len || toIndex < fromIndex)
            throw new IndexOutOfBoundsException();
        / / the new length
        int newlen = len - (toIndex - fromIndex);
        // How many elements move forward
        int numMoved = len - toIndex;
        // No elements need to be moved
        if (numMoved == 0)
            setArray(Arrays.copyOf(es, newlen));
        // Perform a series of moves
        else {
            Object[] newElements = new Object[newlen];
            System.arraycopy(es, 0, newElements, 0, fromIndex); System.arraycopy(es, toIndex, newElements, fromIndex, numMoved); setArray(newElements); }}}Copy the code

Array.copyof (es, newlen)) setArray(Array.copyof (es, newlen)); Otherwise, copy the elements from 0 to fromIndex and then toIndex to Len. It’s just shuffle plus copy. A little harder:

// Delete the element in the array in collection C
public boolean removeAll(Collection
        c) {
    Objects.requireNonNull(c);
    return bulkRemove(e -> c.contains(e));
}
// Keep the elements in the array in c, delete the elements not in C
public boolean retainAll(Collection
        c) {
    Objects.requireNonNull(c);
    returnbulkRemove(e -> ! c.contains(e)); } private booleanbulkRemove(Predicate<? super E> filter) {
    synchronized (lock) {
        return bulkRemove(filter, 0, getArray().length); }}Copy the code

The conditions for deletion are represented by lambda expressions, and bulkRemove is a functional interface to the Predicate. Take a look at the three-argument version of bulkRemove.

This is a clever way to delete a long array and a bit operation. It is not very easy to understand. I have described this method in detail in this blog post. RemoveIf, I’m not going to copy and paste it here, but it’s an interesting idea.

So let’s stop here, and what about replace methods? They’re too simple to make sense. To summarize, any change to the array must trigger a copy of all elements of the array, as well as the setArray method, even the simplest set method.

A read operation

Read operation is much easier, here are a few common read operation code:

/ / get methods
public E get(int index) {
    return elementAt(getArray(), index);
}
static <E> E elementAt(Object[] a, int index) {
    return (E) a[index];
}
// find the subscript according to the element
public int indexOf(Object o) {
    Object[] es = getArray();
    return indexOfRange(o, es, 0, es.length);
}
private static int indexOfRange(Object o, Object[] es, int from, int to) {
    if (o == null) {
        for (int i = from; i < to; i++)
            if (es[i] == null)
                return i;
    } else {
        for (int i = from; i < to; i++)
            if (o.equals(es[i]))
                return i;
    }
    return -1;
}
Copy the code

As you can see, reading doesn’t require locking or copying array elements, it calls a getArray method, takes the snapshot, and looks it up.

CopyOnWriteArrayList

After reading most of the CopyOnWriteArrayList read and write operation code, we can summarize its read and write characteristics:

Write operations lock, lock, only one thread can write, and write to copy the entire array of elements
Read operations do not lock and call the getArray method to get a snapshot, which is searched
Writing is mutually exclusive (because locks are stolen), reading is not mutually exclusive, reading is not mutually exclusive
The result of the read operation may not be the latest because it is only the contents of a snapshot. The result of the read operation is determined as soon as the snapshot is created
Write operations do not modify the existing array/snapshot directly, so read operations do not have to worry about reading while snapshot is being modified by other write operations
After that, setArray will change the array array of the current CopyOnWriteArrayList

The iterator

Initialization and basic structure

This is a function of several iterators that pass the COWIterator method a snapshot of getArray and the starting index.

public Iterator<E> iterator() {
    return new COWIterator<E>(getArray(), 0);
}  
public ListIterator<E> listIterator() {
    return new COWIterator<E>(getArray(), 0);
}
public ListIterator<E> listIterator(int index) {
    Object[] es = getArray();
    int len = es.length;
    if (index < 0 || index > len)
        throw new IndexOutOfBoundsException(outOfBounds(index, len));

    return new COWIterator<E>(es, index);
}
Copy the code

Let’s take a look at its basic structure and initialization:

static final class COWIterator<E> implements ListIterator<E> {
    // Snapshot of the iterator
    private final Object[] snapshot;
    // The array index to be accessed
    private int cursor;
    / / initialization
    COWIterator(Object[] es, int initialCursor) {
        cursor = initialCursor;
        snapshot = es;
    }
// Other operation codes
}
Copy the code

As you can see, the iterator itself maintains a snapshot of CopyOnWriteArrayList with cursor as the next array index to access.

Next and Previous methods

public boolean hasNext() {
    return cursor < snapshot.length;
}

public boolean hasPrevious() {
    return cursor > 0;
}

@SuppressWarnings("unchecked")
public E next() {
    if (! hasNext())
        throw new NoSuchElementException();
    return (E) snapshot[cursor++];
}

@SuppressWarnings("unchecked")
public E previous() {
    if (! hasPrevious())
        throw new NoSuchElementException();
    return (E) snapshot[--cursor];
}
Copy the code

Look at if (! HasNext ()) or if (! HasPrevious ()), the code level is really simple, not to say.

The write operation

public void remove() {
    throw new UnsupportedOperationException();
}
public void set(E e) {
    throw new UnsupportedOperationException();
}
public void add(E e) {
    throw new UnsupportedOperationException();
}
Copy the code

Doug Lea doesn’t want the iterator of CopyOnWriteArrayList to have the ability to modify…

Iterator summary

Iterators also do this by taking a snapshot of CopyOnWriteArrayList at some point in time.

Copy ideas as you write

Copy on write in Linux

When I was in the operating system class, I remember the teacher assigned this kind of parent-child process creation and communication homework. In Linux applications, the fork () will produce a child the same and the parent process, for the sake of efficiency, Linux is no one by copying the data of the parent for the child, but the introduction of the “write copy” technology, which is only the content of the process space of paragraphs to change, will be a copy of the contents of the parent for the child process.

The application of the idea of copying while writing

Redis RDB persistence

At my current level, in addition to the Java CopyOnWriteArrayList and CopyOnWriteArraySet (based on CopyOnWriteArrayList, very little code), The application also knows about Redis’ RDB persistence.

There are two kinds of RDB persistence. One is to call save and persist by the main process, which will cause Redis blocking of single process for a long time. It is generally not recommended. The other option is to call bgSave and fork a child process to persist. RDB persistence is read-only and does not modify the original data, so copy-on-write is also a good idea. The snapshots available to the RDB are determined as soon as the child process is created.

conclusion

This article mainly on CopyOnWriteArrayList of the main source code for the analysis, due to the workload is relatively large, some source code such as subList did not speak, you can be interested in their own to see, relatively simple.

Thread-safe List (a) : CopyOnWriteArrayList source code parsing

preface

Basic structure and initialization

The basic structure

Initialize the

The write operation

Set method

The add operation

Add an element

Add a bunch of elements

The remove operation

Remove an element

Remove a bunch of elements

A read operation

CopyOnWriteArrayList

The iterator

Initialization and basic structure

Next and Previous methods

The write operation

Iterator summary

Copy ideas as you write

Copy on write in Linux

The application of the idea of copying while writing

Redis RDB persistence

conclusion

Related Posts

WordPress Markdown plugin WP editor.md

Application of TiDB in Xishanju real-time public opinion Monitoring system

What is synchronous/asynchronous/blocking/non-blocking