An atom means “the smallest particle that cannot be further divided”, while an atomic operation means “an operation or series of operations that cannot be interrupted”.

Nonblocking synchronization

Replace locks with underlying atomic machine instructions (such as compare and swap instructions) to ensure consistency of data across concurrent access. The implementation of this optimistic concurrency strategy eliminates the need to block and suspend threads, so such Synchronization operations are called non-blocking Synchronization.

Non-blocking algorithm

An algorithm in which the failure or suspension of one thread does not cause other threads to fail or suspend is called a non-blocking algorithm. An algorithm is also called lock-free if there is a thread that can execute at each step of the algorithm. As the JVM upgrades from one version to the next, most of the concurrency performance gains come from the use of non-blocking algorithms (both within the JVM and in platform libraries).

In recent years, most research in the field of concurrent algorithms has focused on non-blocking algorithms, which replace locks with underlying atomic machine instructions (such as compare and swap instructions) to ensure consistency of data over concurrent access. Non-blocking algorithms are widely used in operating systems and JVMS to implement thread/process scheduling mechanisms, garbage collection mechanisms, and locks and other concurrent data structures. Non-blocking algorithms are much more complex in design and implementation than lock-based schemes, but they offer huge advantages in scalability and activity. Because non-blocking algorithms allow multiple threads to compete for the same data without blocking, they can coordinate at a more granular level and greatly reduce scheduling overhead. Moreover, there are no deadlocks and other activity problems in non-blocking algorithms. In Java, you can use atomic variable classes, such as AtomicInteger and AtomicReference, to build efficient non-blocking algorithms.

Hardware support for concurrency

Special instructions are provided in processors designed for multiprocessor operations to manage concurrent access to shared data.

  • Test and Set (test-and-set)

  • Fetch and Increment

  • Swap

  • Compare and Swap (CAS)

  • Load-linked/store-conditional

The first three are processor instructions that have existed in most instruction sets since the 20th century, enough to implement mutexes of all kinds, which in turn can implement more complex concurrent objects. Two are new additions to modern processors, and the purpose and function of the two instructions are similar. CAS function is accomplished by CMP XCHG instruction in IA64 and x86 instruction set, and casA instruction in SPARC-TSO, while A pair of LDREx/Strex instruction is needed to complete LL/SC function under ARM and PowerPC architecture. Operating systems and JVMS use these instructions to implement locking and concurrent data structures, but prior to Java5.0, they were not directly available in Java classes.

Compare and swap (CAS instructions)

The typical usage pattern for CAS is that the CAS instruction requires three operands: the memory location V (which in Java is simply the memory address of A variable, represented by V), the old expected value A, and the new value B to be set. When the CAS instruction executes, the processor updates the value of V with B if and only if V conforms to A, otherwise it does not perform the update. However, the old value of V is returned regardless of whether the value of V is updated, and the above processing is an atomic operation that is not interrupted by other threads during execution. Because CAS can detect interference from other threads, it is possible to implement atomic read-change/write sequences without using locks.

When multiple threads attempt to update the same variable at the same time using CAS, only one thread can update the value of the variable, and all others will fail. However, the failed thread is not suspended, but is told that it has lost the race and can try again. Because a thread that fails to compete for CAS does not block, it can decide whether to try again, or to perform some recovery operations, or to do nothing at all. This flexibility greatly reduces the activity risk associated with locking.

Java uses CAS for operations

On a CAS-enabled platform, the runtime JVM compiles them into the corresponding (multiple) machine instructions. In the worst case, if CAS directives are not supported, then the JVM will use spin locks. In the atomic variable classes (such as Java. Util. Concurrent. The atomic AtomicXxx) used in the underlying JVM support for numeric types and reference types to provide a highly efficient CAS operations, Most classes in java.util.Concurrent use these atomic variable classes directly or indirectly when implemented.

The CAS operation is used in the Java library, wrapped by several methods in the Sun.misc. Unsafe class, such as compareAndSwapInt() and compareAndSwapLong(). The Java virtual machine internally treats these methods specially, and the result of instant compilation is a platform-specific processor CAS instruction with no method calls. The Unsafe Class, however, is not designed to be called by user programs (the Unsafe::getUnsafe() code restricts access to the Unsafe Class only to classes loaded by the Bootstrap ClassLoader), Thus, prior to JDK 9 only Java libraries could use CAS, such as the integer atom class in the J.U.C package, where methods such as compareAndSet() and getAndIncrement() were implemented using the Unsafe class’s CAS operations. If an application also needs to use CAS, it must either use reflection to overcome Unsafe’s access restrictions, or use it indirectly through the Java library API. It wasn’t until after JDK9 that the Java class library opened CAS operations for user applications in the VarHandle class.

Unsafe provides three CAS methods:

  • compareAndSwapObject()
  • compareAndSwapInt()
  • compareAndSwapLong()

Atomic variable

Atomic base type class

The Atomic package provides the following three classes:

  • AtomicBoolean: Atom updates Boolean type.
  • AtomicInteger: Atom update integer.
  • AtomicLong: Atom updates long integers.

The methods provided by the above three classes are almost identical. Taking AtomicInteger as an example, the common methods of AtomicInteger are:

  • Int addAndGet (int delta) : Atomically adds the input value to the value in the instance (the value in the AtomicInteger) and returns the result.
  • Boolean compareAndSet (int expect, int update) : If the value entered is equal to the expected value, set it to the input value atomically.
  • Int getAndIncrement() : Atomically increments the current value by 1.
  • Int getAndSet (int newValue) : The value set atomically to newValue and returns the old value.

Take getAndIncrement as an example to illustrate its atomic operation process

// the AtomicInteger class getAndIncrement method public final int getAndIncrement() {return unsafe.getAndAddInt(this, valueOffset, 1);  } //unsafe class getAndAddInt method public final int getAndAddInt(Object var1, long var2, int var4) {int var5; do { var5 = this.getIntVolatile(var1, var2); } while(! this.compareAndSwapInt(var1, var2, var5, var5 + var4)); return var5; } var1: object of operation var2: memory location read V var4: value to be added VAR5: old expected value A var5 + VAR4: new value BCopy the code

The old expected value A is read from V and the new value B is computed from A, and then the value in V is atomically changed from A to B by the CAS instruction (compareAndSwapInt) (as long as no thread changes the value of V to anything else in the meantime). If CAS fails, the operation will be retried immediately until the modification succeeds.

The Atomic package only provides three basic types of Atomic updates. If you look at the AtomicBoolean source code, you can see that AtomicBoolean converts the Boolean to an integer and then CAS using compareAndSwapInt. So atomically updated char, float, and double variables can be converted to int in a similar way.

Atomic array class

The Atomic package provides three classes for updating an element in an array:

  • AtomicIntegerArray: Atom updates the elements of an integer array.
  • AtomicLongArray: Atom updates elements in a long integer array.
  • AtomicReferenceArray: The atom updates the element in the array of reference types.

Take AtomicIntegerArray as an example. The common methods of AtomicIntegerArray are as follows:

  • Int getAndAdd (int I, int delta) : Atomically adds the input value to the elements of index I in the array.

  • Boolean compareAndSet (int I, int expect, int UPDATE) : Atomically set the element at position I to update if the current value is equal to the expected value.

    Public AtomicIntegerArray(int length) {array = new int[length]; } public AtomicIntegerArray(int[] array) { this.array = array.clone(); }

The array value is passed in through the constructor, and the AtomicIntegerArray makes a copy of the current array, so when the AtomicIntegerArray makes changes to the internal array elements, it doesn’t affect the array passed in.

//AtomicIntegerArray getAndAdd public final int getAndAdd(int I, int delta) {return unsafe. GetAndAddInt (array, checkedByteOffset(i), delta); } private Long checkedByteOffset(int var1) {if (var1 >= 0 && var1 < this.array.length) {return byteOffset(var1); } else { throw new IndexOutOfBoundsException("index " + var1); Private static long byteOffset(int var0) {return ((long)var0 << shift) + (long)base; }Copy the code

The getAndAdd method is the same as the one AtomicInteger always calls above. GetAndAddInt (), which checks subscripts before fetching data and then computs the memory offset address.

Atom update reference class

Atomic update reference types, the Atomic package provides the following three classes:

  • AtomicReference: Atomic update reference type.
  • AtomicStampedReference: Atom updates a reference type with a version number.
  • AtomicMarkableReference: Atom updates reference types with marker bits. You can atomically update a Boolean tag bit and reference type. The constructor is AtomicMarkableReference (V initialRef, Boolean initialMark).

Take AtomicReference as an example, the common methods of AtomicReference are:

  • V getAndSet(V newValue) : Sets the atom to the given value and returns the old value.

  • Boolean compareAndSet(V expect, V update) : If the current value == is expected, set the value to the given update value.

    public final boolean compareAndSet(V expect, V update) { return unsafe.compareAndSwapObject(this, valueOffset, expect, update); }

Calling the Unsafe class compareAndSwapObject() method, the AtomicInteger that updates the primitive type can only update a single variable. If you want to update multiple variables atomically, you need to use this atom to update classes provided by the reference type.

Atomic update field class

If you need to update a field in a class atomically, you need to use the Atomic update field class. The Atomic package provides the following three classes for Atomic field updates:

  • AtomicIntegerFieldUpdater: atomic updates integer field updater.
  • AtomicLongFieldUpdater: A updater that atomically updates long integer fields.
  • AtomicReferenceFieldUpdater: atomic updates a reference type in the field.

It takes two steps to update the field class atomically. First, because atomic update field classes are abstract classes, each time you use them you must create a updater using the static method newUpdater(), and you need to set the class and properties that you want to update. Second, the fields (attributes) of the class must be updated using the public volatile modifier.

AtomicIntegerFieldUpdater, for example

// Create atom updater, And set up the need to update the object classes and attributes of the object private static AtomicIntegerFieldUpdater < User > a = AtomicIntegerFieldUpdater. NewUpdater (User. The class,  "old");Copy the code

AstomicIntegerFieldUpdater not associated with a particular instance of together, thus it can be updated in any instance of the target class field. Updated field classes provide a weaker guarantee of atomicity than regular atomic classes, because there is no guarantee that the underlying fields will not be modified directly – compareAndSet and other methods only ensure atomicity for other threads using such methods. In almost all cases, ordinary atomic variables perform well, and only in rare cases do you need to use atoms to update field classes. (Atomic update fields are useful if you need to maintain the serialized form of an existing class while performing atomic updates.)

New LongAdder JDK8

JDK8 adds five new classes to the atomic package, namely Striped64, LongAdder, LongAccumulator, DoubleAdder, DoubleAccumulator. Sriped64 is the parent class.

The Alibaba Java Development Manual recommends using LongAdder instead of AtomicLong under JDK8

LongAdder

The AtomicLong above provides non-blocking atomic operations through CAS. Under high concurrency, a large number of threads will compete to update the same atomic variable at the same time. However, because only one thread’s CAS operation will succeed at the same time, a large number of threads will continue to spin the CAS operation through an infinite loop when the competition fails. This is a waste of CPU resources.

JDK8 added an atomic increment or decisis class, LongAdder, to overcome the disadvantage of using AtomicLong in high concurrency situations. LongAdder is the idea of splitting a variable into multiple variables, with the same number of threads competing for multiple resources.

When LongAdder is used, multiple Cell variables are maintained internally, and each Cell has a long variable with an initial value of 0. In this way, under the same concurrency, the number of threads competing for the update operation of a single variable will be reduced, which in a way reduces the concurrency competing for shared resources. In addition, if multiple threads are competing for the same Cell atomic variable and fail, instead of spinning the CAS on the current Cell variable, it will try the CAS on other Cell variables. This change increases the likelihood that the current thread will successfully retry the CAS. Finally, the current value of LongAdder is returned by summing the values of all Cell variables and adding base.

Contended @sun.misc.Contended Static final class Cell {// The Cell maintains a volatile variable inside the Cell volatile long value; Cell(long x) { value = x; } final Boolean cas(long CMP, long val) { Ensures that the current thread updated Cell elements are assigned the value value of atomic return UNSAFE.com pareAndSwapLong (this, valueOffset, CMP, val); } // Unsafe mechanics private static final sun.misc.Unsafe UNSAFE; private static final long valueOffset; static { try { UNSAFE = sun.misc.Unsafe.getUnsafe(); Class<? > ak = Cell.class; valueOffset = UNSAFE.objectFieldOffset (ak.getDeclaredField("value")); } catch (Exception e) { throw new Error(e); }}}Copy the code

The Cell internally maintains a variable declared as volatile. The cas function (CAS operation) ensures the atomicity of the value in the Cell element allocated when the current thread updates. The @sun.misc.Contended annotation fills the Cell class with bytes. This prevents multiple elements in the array from sharing a cache line, which is a performance boost.

public long sum() { Cell[] as = cells; Cell a; long sum = base; if (as ! = null) { for (int i = 0; i < as.length; ++i) { if ((a = as[i]) ! = null) sum += a.value; } } return sum; }Copy the code

The long sum() method returns the current value. The internal operation is to sum all values inside the Cell and then add base. For example the following code, as a result of calculating the sum not to lock the Cell array, so there may be other threads in the process of accumulation to modify the values in the Cell, it is also possible to array the expansion, so the return value of the sum is not very accurate, the return value is not a call atomic snapshot value of the sum method.

public void add(long x) { Cell[] as; long b, v; int m; Cell a; if ((as = cells) ! = null || ! casBase(b = base, b + x)) { boolean uncontended = true; if (as == null || (m = as.length - 1) < 0 || (a = as[getProbe() & m]) == null || ! (uncontended = a.cas(v = a.value, v + x))) longAccumulate(x, null, uncontended); } } final boolean casBase(long cmp, long val) { return UNSAFE.compareAndSwapLong(this, BASE, cmp, val); }Copy the code

If cells are not null casBase(b = base, b + x), the CAS operation is performed on the base, which is similar to the AtomicLong operation.

If cells are null or the CAS operation fails, the longAccumulate() method is called. This method uses an infinite loop to initialize and expand the cell array.

summary

LongAdder maintains a delayed-initialization array of atomic updates (the Cell array is null by default) and a base variable base. Because Cells are relatively large in memory, they are not created initially, but when needed, which is lazy loading. When the Cell array is null and the number of concurrent threads is small, all summing is done on the base variable. Keep the size of the Cell array to the power of 2, the number of Cell elements in the Cell array is 2, and the variable entities in the array are of type Cell. The Cell type is an improvement on AtomicLong to reduce cache contention, which is to solve the pseudo-sharing problem. Byte padding is wasteful for most isolated multi-atomic operations because atomic operations are irregularly scattered in memory (that is, the memory addresses of multiple atomic variables are not contiguous) and it is unlikely that multiple atomic variables will be placed in the same cache row. However, the memory address of atomic array elements is contiguous, so multiple elements in the array can often share the cache row. Therefore, the @sun.misc.Contended annotation is used to fill the Cell class with bytes. This prevents multiple elements in the array from sharing a cache row, which is a performance improvement.

The LongAdder class inherits from the Striped64 class and maintains three variables within Striped64. The true value of LongAdder is the sum of the base value and the values of all the Cell elements in the Cell array. Base is the base value and defaults to 0. CellsBusy is used to implement the spin lock. The state value is 0 and 1. When creating Cell elements, expanding Cell arrays, or initializing Cell arrays, CAS is used to operate this variable to ensure that only one thread can perform any of the operations at the same time.

LongAccumulator

The LongAdder class is a special case of LongAccumulator, which is more powerful than LongAdder.

LongAccumulator can provide a non-zero initial value for the accumulator compared to LongAdder, which can only provide a default value of 0. In addition, the former can also specify accumulation rules. For example, the former can be multiplied without accumulation and only need to pass in the custom binocular operator when constructing LongAccumulator. The latter has built-in accumulation rules.

Atomic variable performance and problems

ABA problem:

Because CAS needs to check if the value has changed and update it if it hasn’t, if A value that was originally A changes to B and then to A, then CAS checks to find that the value hasn’t changed, but in fact it has. The solution to the ABA problem is to use version numbers. Append the version number to the variable and increment the version number by 1 each time the variable is updated. Then A→B→A becomes 1A→2B→3A. ABA problems can be solved by AtomicStampedReference and AtomicMarkableReference:

Add version number to the reference. The compareAndSet method checks whether the reference is equal to the expected reference, and whether the flag is equal to the expected flag. If all parameters are equal, the compareAndSet method checks whether the reference is equal to the expected flag. The reference and the flag are set atomically to the given updated value.

AtomicMarkableReference updates an Object reference-Boolean true and false flag to indicate whether it has been modified.

ABA problems do not affect concurrency in most cases, and if ABA problems need to be solved, switching to traditional mutex synchronization may be more efficient than atomic classes.

performance

In a highly competitive case, the performance of the lock will exceed that of the atomic variable, but in a more realistic competitive case, the performance of the atomic variable will exceed that of the lock. The same holds true in other areas: traffic lights achieve higher throughput when traffic is heavy, and roundings achieve higher throughput when congestion is low. This is because locks suspend threads in the event of a race, reducing CPU utilization and synchronous traffic on the shared memory bus. Atomic variables are CAS based algorithms that will retry immediately when a race is encountered, which is usually the right approach, but leads to more races in a fiercely contested environment.

The performance differences of lock and atomic variables at different competitive levels can well explain their advantages and disadvantages. At low to medium levels of contention, atomic variables provide higher scalability, while at high levels of contention, locks are more effective at avoiding contention. (On a single-CPU system, the CAS-based algorithm also outperforms the lock-based algorithm in terms of performance, because CAS usually succeeds on a single-CPU system.)

The third curve is one that uses ThreadLocal to store state. This implementation alters the behavior of the class, and the overhead is lower if shared state can be avoided. Scalability can be improved by handling competition more efficiently, but true scalability can only be achieved by eliminating competition completely.

Atomic variables are a “better volatile variable” to provide atomic updates for integer and object references.

reference

An in-depth understanding of the Java Virtual Machine: Advanced Features and Best Practices for the JVM (3rd edition