Get to the bottom of Java concurrency CAS

A little progress every day adds up to something!

preface

In back-end development, you must have encountered the need to implement a thread-safe counter. From experience, you should know that we need to implement atomicity and visibility of “shared variables” in multi-threading, so locking becomes an inevitable topic. Today we will discuss the equivalent of lockless CAS. This article will bring you a real understanding of CAS from different angles such as how it came, what it is, how to use it, principle analysis and problems encountered.

Why unlocked

When we think of the way to ensure safety in multi-threading, the first thing we must lift out is the lock, no matter from the hardware, operating system level are more or less in the use of lock. Are there any disadvantages to locks? Of course there are. The reason there are so many different locks in the JDK is because each lock has its advantages and disadvantages.

Use lock will need to get the lock, the lock is released, the CPU needs through a context switch and scheduling management to proceed with this operation, for a “an exclusive lock” a thread at the end of the execution after the lock is held other elder brothers will have to wait outside, wait until the complete CPU in front of the elder brothers eldest brother will lock out other threads to robbed (not fair). The concept of locking is based on a pessimistic mechanism that assumes that data will be modified, so you put a lock on a block of code before you operate on it, release it when you’re done, and you’re safe. You can do this with synchronized in JDK1.5.

However, operations like the one above can cause CPU switching in multiple threads, which is very expensive, and we know that we can avoid some of these problems by using a specific type of lock. Is there anything else besides the lock? Of course, some people have proposed the lock-free algorithm, the famous one is CAS (Compare and Swap). Different from locking, CAS is an optimistic mechanism. It believes that when others take data, they will not modify it, but when they modify data, they will judge the current state of the data. In this way, the CPU will not switch and performance will be greatly improved in the case of heavy reads. Most of the cpus we use now have CAS instructions, from the hardware level support lock-free, so that development time to call it.

Both locking and no locking have their advantages and disadvantages. We will also illustrate the CAS problem through examples later.

What is a CAS?

What is CAS? I can’t wait to see what Wikipedia has to say

Compare and swap (CAS) is a kind of atomic operation, which can be used in multithreaded programming to achieve uninterrupted data exchange operation, so as to avoid the data inconsistency caused by the uncertainty of execution order and the unpredictability of interruption when multithreading simultaneously rewriting a certain data. This operation compares the value in memory with the specified data, and replaces the data in memory with the new value if the value is the same.

CAS gives us an idea of how to achieve atomicity by “comparing” and “replacing”. Let’s look at a piece of code:

'int cas(long *addr, long old, long new) {' /* atom execute */' if(*addr! = old)` `return 0; ` `*addr = new; ` `return 1; ` ` `}Copy the code

This is a code from c language, you can see that there are three parameters, respectively:

*addr: value to be compared
Old: indicates the current memory value
New: The new value to be modified is written to memory

As long as the current value passed in for comparison is equal to the value in memory, the new value is modified successfully, otherwise 0 is returned to indicate that the comparison failed. Those of you who have studied databases know pessimistic locks and optimistic locks, and optimistic locks always assume that data cannot be modified. Based on this assumption, CAS operations also assume that the value in memory is equal to the current value, so the operation will always succeed. We can achieve atomic operations in multiple threads without locking.

When you update a variable using CAS at the same time in multiple threads, only one of the threads can update the value of the variable, and all the other threads fail, the thread that fails will not be blocked and suspended, but will be told that the change failed and you can try again, so you can write code like this.

`while (! cas(&addr, old, newValue)) {` `}` `// success` `printf("new value = %ld", addr); `Copy the code

However, you can probably see something wrong with this code, which we will examine later. Let’s look at how CAS is used in Java.

Java in the CAS

Again, if you’re using a Java API, you might think of two ways to do this: locking (synchronized or some other kind of locking) and using atomic classes such as AtomicInteger. This is when JDK1.5 to a series of classes, in our common Java. Util. Concurrent. The atomic package, let’s look at an example:

`ExecutorService executorService = Executors.newCachedThreadPool(); ` `AtomicInteger atomicInteger = new AtomicInteger(0); ` `for (int i = 0; i < 5000; i++) {` `executorService.execute(atomicInteger::incrementAndGet); ` `}` `System.out.println(atomicInteger.get()); ` `executorService.shutdown(); `Copy the code

This example starts 5000 threads to do the sum, and the answer is 5000 no matter how many times you do it. How does this amazing operation work? This is done using CAS technology, so let’s peel back AtomicInteger and look at its code:

`public class AtomicInteger extends Number implements java.io.Serializable {` `private static final long serialVersionUID = 6214790243416807050L; ` `// setup to use Unsafe.compareAndSwapInt for updates` `private static final Unsafe unsafe = Unsafe.getUnsafe(); ` `private static final long valueOffset; ` `static {` `try {` `valueOffset = unsafe.objectFieldOffset` `(AtomicInteger.class.getDeclaredField("value")); ` `} catch (Exception ex) { throw new Error(ex); }` `}` `private volatile int value; ` `/**` `* Creates a new AtomicInteger with the given initial value.` `* @param initialValue the initial value` `*/` `public AtomicInteger(int initialValue) {` `value = initialValue; ` `}` `/**` `* Gets the current value.` `* @return the current value` `*/` `public final int get() {` `return value; ` `}` `/**` `* Atomically increments by one the current value.` `* @return the updated value` `*/` `public final int incrementAndGet() {` `return unsafe.getAndAddInt(this, valueOffset, 1) + 1; ` ` `} ` `}Copy the code

I’ve only posted the code related to our previous example here, but everything else is similar, as you can see incrementAndGet calls the broadening. GetAndAddInt method. Unsafe is a low-level class that the JDK provides, and it doesn’t allow programmers to use it directly, for fear of breaking the machine. (You can actually get an instance of this class by reflection.) You’ll see this in various places in the JDK source code, so let’s talk about what it does:

Memory management: Allocates and releases memory
Manipulate classes, objects, variables: Modify data directly by fetching objects and variable offsets
Suspend and Resume: To block or restore a blocked thread
CAS: invokes the CAS instruction of the CPU for comparison and exchange
Memory barriers: Define memory barriers to avoid instruction reordering

This is just an overview of the common operations, which can be found in the reference links at the end of this article. Let’s continue looking at what the unsafe getAndAddInt is doing.

`public final int getAndAddInt(Object var1, long var2, int var4) {` `int var5; ` `do {` `var5 = this.getIntVolatile(var1, var2); ` `} while(! this.compareAndSwapInt(var1, var2, var5, var5 + var4)); ` `return var5; ` `}` `public native int getIntVolatile(Object var1, long var2); ` `public final native boolean compareAndSwapInt(Object var1, long var2, int var4, int var5); `Copy the code

Get the current value of memory using getIntVolatile, and then compare it with compareAndSwapInt.

Var1: The object currently being manipulated (essentially an AtomicInteger instance)
Var2: The offset of the variable currently being manipulated (understood as the current value of memory in CAS)
Var4: expected memory value
Var5: The new value to be modified

This.com pareAndSwapInt(var1, VAR2, VAR5, var5 + VAR4) If they are equal then I change the memory value var5 to var5 + var4 (var4 is 1, but it could be anything else).

Here we also need to explain what is offset? You may have seen this in the previous code:

`// setup to use Unsafe.compareAndSwapInt for updates` `private static final Unsafe unsafe = Unsafe.getUnsafe(); ` `private static final long valueOffset; ` `static {` `try {` `valueOffset = unsafe.objectFieldOffset` `(AtomicInteger.class.getDeclaredField("value")); ` `} catch (Exception ex) { throw new Error(ex); }` `}` `private volatile int value; `Copy the code

As you can see, the offset of the value field of the AtomicInteger class is retrieved when the static block is executed. What is the use of this long data? In many areas of the Unsafe class, you need to pass in obj and offsets, which, combined with the ability to address Unsafe, directly changes the object fields in memory in a lower-level manner.

Using the above way can be a good solution to the problem of atomicity and visibility under multithreading. Because the code uses the do while loop structure, so the CPU will not be suspended, the comparison fails after retry, there is no context switch, achieve lockless concurrent programming

Problems with CAS

Disadvantages of spin

If you look at the code above, you’ll notice a problem. What if the while loop always fails at worst? This causes the CPU to be constantly processing. Like this while (! The operation of compareAndSwapInt is called spin. CAS is optimistic, thinking that not all people are going to modify the data. In reality, there may be a lot of threads coming to modify the data. So let’s be clear, if you’re reading more than you’re writing, you’re optimistic, and the performance is pretty good.

ABA problem

If the value of A memory is changed by one thread to B, and another thread changes it to A, then the CAS operation is successful. The thread that changed data to B should read B instead of A. If you have done optimistic locking for databases, you might think that we should use A version number to make A comparison. The JDK provides an AtomicStampedReference class to solve this problem. Here’s an example:

`int stamp = 10001; ` `AtomicStampedReference<Integer> stampedReference = new AtomicStampedReference<>(0, stamp); ` `stampedReference.compareAndSet(0, 10, stamp, stamp + 1); ` `System.out.println("value: " + stampedReference.getReference()); ` `System.out.println("stamp: " + stampedReference.getStamp()); `Copy the code

Its constructor takes two arguments and passes in an initial timestamp, which is used to append a version of the data so that multiple threads can modify if the stamps provided are different. In addition to providing a new value when modifying the data, a new stamp is also provided, so in multi-threaded situations whenever the data is modified, the stamp must change, and the other thread gets the old stamp, so the modification fails.

Try to apply

Since CAS provides such a nice API, we might as well use it to implement a simple version of the exclusive lock. When a thread enters the lock method, it compares whether the memory value of the lock object is false. If so, it means that the lock object can be acquired. After obtaining the lock, it changes the memory value to true. Unlock; change memory value to false;

`public class SpinLock {` `private AtomicBoolean mutex = new AtomicBoolean(false); ` `public void lock() {` `while (! mutex.compareAndSet(false, true)) {` `// System.out.println(Thread.currentThread().getName()+ " wait lock release"); ` `}` `}` `public void unlock() {` `while (! mutex.compareAndSet(true, false)) {` `// System.out.println(Thread.currentThread().getName()+ " wait lock release"); ` `} ` `} ` ` `}Copy the code

We’re using AtomicBoolean, but AtomicInteger is fine because we’re only saving one state Boolean and we’re using it in a small way. The implementation of this lock is relatively simple, but its disadvantages are very obvious. The spin caused by the while loop will make other threads occupy the CPU, but it can also be used. I will improve and explain the implementation of the optimized version of the lock in the following articles.

CAS source

After looking at the above codes and explanations, I believe you have already understood CAS. The following principle is what the C++ code in the previous native method writes. In its/hotspot/SRC/share/vm/prims directory has an Unsafe. CPP file has a piece of code like this:

Note: The hotspot implementation is used as an example

`UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))` `UnsafeWrapper("Unsafe_CompareAndSwapInt"); ` `oop p = JNIHandles::resolve(obj); Jint * addr = (jint *) index_oop_from_field_offset_long(p, offset); '// Perform an atomic operation' // If the result is different, it returns directly because someone else changed it; Otherwise you keep trying to fix it. Until you succeed. ` `return (jint)(Atomic::cmpxchg(x, addr, e)) == e; ` `UNSAFE_ENDC`Copy the code

Ok, that’s all about CAS today. I hope it will be helpful to you.