Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

A race condition

Threads share heap space. When the same program multithreaded access to the same resource, if the access order of resources is sensitive, it is said that there is a race condition, the code area becomes a critical area.

For example, check before execution. Execution depends on the result of detection, and the result of detection depends on the execution sequence of multiple threads, and the execution sequence of multiple threads is usually not fixed and can not be judged, resulting in various problems in the execution result.

Thread safety

What is thread safety?

“” when multiple threads access to an object, if don’t have to consider these threads in the runtime environment scheduling and execution alternately, also do not need to undertake additional synchronization, or any other coordinated operation in the caller, call the object’s behavior can get the right results, that the object is thread safe.”

Thread-safe policy (Shared data)

The data shared by various operations in the Java language can be classified into the following five categories: immutable, absolute thread-safe, relative thread-safe, thread-compatible, and thread-antagonistic.

1. Immutable

An Immutable object is thread-safe. Neither its method implementation nor its caller requires any thread-safe safeguards. The visibility provided by the final keyword is provided that an Immutable object is constructed correctly (without this reference escaping). Its external visible state never changes, and you never see it in an inconsistent state across multiple threads. Immutable security is the simplest and purest.

In the Java language, if shared data is a basic data type, it is guaranteed to be immutable as long as it is defined with the final keyword. If shared data is an object, you need to ensure that the behavior of the object does not affect its state. Consider an object of the Java.lang. String class, which is typically an immutable object. Our calls to its substring (), replace (), and concat () methods do not affect its original value and only return a newly constructed string object.

There are many ways to ensure that an object’s behavior does not affect its state. The simplest of these is to declare all variables with state in the object final, so that it is immutable after the constructor ends, as shown in the java.lang.Integer constructor, It guarantees state invariance by defining the internal state variable value as final.

    /**
     * The value of the {@code Integer}.
     *
     * @serial
     */
    private final int value;

    /**
     * Constructs a newly allocated {@code Integer} object that
     * represents the specified {@code int} value.
     *
     * @param   value   the value to be represented by the
     *                  {@code Integer} object.
     */
    public Integer(int value) {
        this.value = value;
    }
Copy the code

In addition to the String mentioned above, enumeration types are commonly used in the Java API, and some subclasses of Java.lang. Number, such as numeric wrapper types such as Long and Double, and big data types such as BigInteger and BigDecimal. However, AtomicInteger and AtomicLong, both subtypes of Number, are not immutable. Take a look at the source code for these two atomic classes and see why.

2. Absolute thread safety

It is often a high, and sometimes unrealistic, price to pay for a class that “no matter what the runtime environment is, the caller does not need any additional synchronization measures.” Classes that label themselves as thread-safe in the Java API are mostly not thread-safe. We can see what “absolutely” means here by looking at a thread-safe class in the Java API that is not “absolutely thread-safe.”

If java.util.vector is a thread-safe container, no Java programmer would argue with that, because methods like add (), get (), and size () are synchronized modified, which is inefficient, but safe. However, even if all of its methods are decorated to be synchronized, this does not mean that it will never need to be synchronized when called

Thread removeThread=new Thread (new Runnable () {@override public void run () {synchronized (vector) {for (int I =0; I < vector.size (); I ++) {vector.remove (I); }}}}); Thread printThread=new Thread (new Runnable () {@override public void run () {synchronized (vector) {for (int I =0; I < vector.size (); I ++) {system.out.println ((vector.get (I))); }}}});Copy the code

Get rid of the red, can appear Exceptioninthread “Thread – 132” is Java. Lang. ArrayIndexOutOfBoundsException. Although the get (), remove (), and size () methods on the Vector used here are all synchronized, in a multi-threaded environment it is still unsafe to use this code without additional synchronization on the method calling side, because if another thread removes an element at exactly the wrong time, Lead to the serial number is no longer available, I again with I will be thrown a ArrayIndexOutOfBoundsException access array. In order for this code to work correctly, we would have to change the definitions of removeThread and printThread to look red.

 

3. Relative thread safety

Relative thread safety is what we generally speaking thread-safe, it need to make sure that this object is a separate operation is thread-safe, we don’t need to do when calling the additional protection measures, but for some particular sequence of consecutive calls, may need to end the call to use additional synchronization method to ensure the correctness of the call. In the Java language, most thread-safe classes are of this type, such as vectors, HashTable, Collections wrapped by the synchronizedCollection () method of Collections, and so on.

4. Thread compatibility

Thread-compatible is when an object itself is not thread-safe, but can be safely used in a concurrent environment by properly using synchronization on the calling side. When we say that a class is not thread-safe, we mean this most of the time. Most classes in the Java API are thread-compatible, such as the collection classes ArrayList and HashMap, which correspond to Vector and HashTable.

5. Thread opposition

Thread opposition is code that cannot be used concurrently in a multithreaded environment, regardless of whether synchronization is taken on the calling end. Because the Java language is inherently multithreaded, thread-opposition code that excludes multithreading is rare, often harmful, and should be avoided. An example of Thread opposition is the suspend () and resume () methods of the Thread class. If you have two threads holding a Thread object, one trying to interrupt the Thread and the other trying to resume the Thread, the target Thread is at risk of deadlocking regardless of whether the call is synchronized. If suspend () is interrupted by the same thread that is about to resume (), a deadlock must occur. It is for this reason that the suspend () and resume () methods have been Deprecated (@deprecated) by the JDK declaration. Common thread opposing and operating System. SetIn (), the Sytem. SetOut () and System. RunFinalizersOnExit (), etc.

Thread-safe implementation

1. Mutually exclusive synchronization

Mutual Exclusion&Synchronization is a common concurrency correctness guarantee. Synchronization refers to ensuring that shared data is only used by one (or a few, when using semaphores) thread at a time when multiple threads concurrently access the data. Mutex is a means to realize synchronization. Critical Section, Mutex and Semaphore are the main ways to realize Mutex. Thus, in these four characters, mutual exclusion is the cause and synchronization is the effect; Mutual exclusion is the method, synchronization is the destination.

In Java, the most basic means of mutually exclusive synchronization is the synchronized keyword, which, after compilation, forms two bytecode instructions, Monitorenter and Monitorexit, respectively, before and after the synchronized block. Both bytecodes require a reference type parameter to specify which object to lock and unlock. Synchronized in a Java program is the object reference if it explicitly specifies an object parameter; If not explicitly specified, the corresponding object instance or Class object is taken as the lock object, depending on whether synchronized modifies an instance or Class method.

As required by the virtual machine specification, when monitorenter is executed, it first attempts to acquire the lock of an object. If the object is not locked or the current thread already owns the lock on that object, increments the lock counter by one. Accordingly, it decrement the lock counter by one when monitorexit is executed. When the count reaches zero, the lock is released. If the object lock fails to be acquired, the current thread blocks and waits until the object lock is released by another thread.

There are two things to note about the behavior of the VIRTUAL machine specification for monitorenter and Monitorexit. First, synchronized blocks are reentrant to the same thread and do not lock themselves. Second, the synchronized block blocks subsequent threads until the incoming thread is finished executing. Java threads are mapped to the native threads of the operating system. If you want to block or wake up a thread, the operating system needs to do it for you. This requires a transition from user state to core state, so state transition takes a lot of processor time. For simple synchronized blocks of code (such as getter () or setter () methods modified by synchronized), state transitions can take longer than user code execution. Synchronized is a planned double feature of the Java language, but experienced programmers use it only when it’s absolutely necessary. The virtual machine itself is also optimized to include a spin wait before notifying the operating system of a blocked thread to avoid frequent interruptions to the core mentality.

In addition to synchronized, you can also use the java.util.concurrent package ReentrantLock, which is similar to synchronized in basic usage, They both have the same thread reentrent property, but the code is written differently, one as an API-level mutex (lock () and unlock () with try/finally blocks) and the other as a native syntactic mutex.

Synchronized, already the difference:

However, compared with synchronized, already added some advanced features, basically has the following three items: wait can interrupt, which can realize fair lock, lock and can bind multiple conditions.

  1. Wait interruptible: Wait interruptible means that when the thread holding the lock does not release the lock for a long time, the waiting thread can choose to abandon the wait and process other things instead. The interruptible feature is very helpful for processing synchronous blocks with very long execution times.
  2. Fair lock: A fair lock means that when multiple threads are waiting for the same lock, they must obtain the lock in the sequence in which they applied for the lock. Non-fair locks do not guarantee this, and any thread waiting for the lock has the opportunity to acquire it when the lock is released. A lock in synchronized is unfair, and ReentrantLock is unfair by default, but a fair lock can be demanded through a constructor with a Boolean value.
  3. The lock binds multiple conditions: A ReentrantLock can bind multiple Condition objects simultaneously. Synchronized implements an implicit Condition with the wait () and notify () or notifyAll () methods. If more than one Condition is associated with a lock, You have to add an additional lock, whereas ReentrantLock doesn’t have to do this, just calling the newCondition () method multiple times.

ReentrantLock is a good choice if you need to use the above features.

Other differences between synchronized and reentrantLock:

Functional differences:

The biggest difference between the two methods is that Synchronized is a Java language keyword that is mutually exclusive at the native syntax level and requires JVM implementation. ReentrantLock is an API-level mutex provided after JDK 1.5 that requires lock() and unlock() methods in conjunction with a try/finally block

Convenience: Clearly Synchronized is easier and simpler to use, and the compiler ensures lock locking and release, whereas ReenTrantLock requires manual declarations to lock and release locks. To avoid deadlocks caused by forgetting to manually release locks, it is best to declare lock release in finally.

Fine granularity and flexibility of locking: ReenTrantLock is clearly superior to Synchronized

Performance differences:

Before Synchronized optimization, the performance of Synchronized was much worse than that of ReenTrantLock, but since the introduction of biased locking, lightweight locking (spin locking), Synchronized and ReenTrantLock performance has become almost the same. In cases where both methods are available, Synchronized is even officially recommended, and I feel that the optimization of synchronized is borrowed from the CAS technology in ReenTrantLock. All attempts are made to solve the locking problem in user mode and avoid the blocking of threads entering kernel mode.

The comparison results above are based on JDK5, with a large number of optimizations for synchronized added in JDK6. In the same test, synchronized and reentrantLock performed equally well.

Given that ReentrantLock is functionally a superset of synchronized and at least as powerful as synchronized, should the synchronized modifier be discarded? Of course not,

  1. Synchronized is syntactic Java level synchronization that is clear and simple enough.
  2. The Lock should ensure that the Lock is released in the finally block, otherwise if an exception is thrown ina synchronized protected block, the Lock held may never be released. This needs to be guaranteed by the programmer. With synchronized, the virtual machine ensures that exceptions occur immediately and locks are automatically released.
  3. Although reentrantLock increases performance over synchronized in JDK1.5, this claim was won decades ago, and in the long run, it’s easier for virtual machines to optimize against synchronized, Because a VIRTUAL machine can record information about synchronized locks in thread and object metadata, using lock makes it difficult for a virtual machine to figure out which lock objects are held by a particular thread.

 

2. Non-blocking synchronization

The main issue with mutex Synchronization is the performance issue associated with thread Blocking and wake up. Therefore, mutex Synchronization is also called Blocking Synchronization. In terms of how it’s handled, mutex is a pessimistic concurrency strategy that assumes that as long as you don’t do the right synchronization (such as locking), you’re bound to have a problem locking the shared data regardless of whether or not it’s actually competing. (This is a conceptual model, In fact, the virtual machine optimizes a large portion of the unnecessary locking), user kernel mindset conversion, maintains lock counters, and checks if there are blocked threads that need to be woken up. With the development of hardware instruction sets, we have another option: an optimistic concurrency strategy based on collision detection. In plain English, the operation is performed first, and if no other threads compete for the shared data, the operation succeeds. If shared data is contended for and conflicts occur, other compensations are taken (the most common of which is retries until success occurs). Many implementations of this optimistic concurrency strategy do not require threads to be suspended, Therefore, such Synchronization operations are called non-blocking Synchronization.

Why does the use of an optimistic concurrency strategy require “hardware instruction set development”? Because we need the operation and collision detection steps to be atomic, how do we guarantee that? There’s no point in using mutex here, so we have to rely on hardware to make sure that a semantically multiple operation can be done with a single processor instruction. Common examples of this kind of instruction are:

  • Test-and-set.
  • Fetch and Increment.
  • Swap, “Swap.”
  • Compare and Swap (CAS).
  • Load Linked/ store-conditional (hereafter LL/SC).

The first three are processor instructions that have been in most instruction sets since the 20th century. The last two are new additions to modern processors, and their purpose and function are similar. In IA64 and x86 instruction sets, CMPXCHG instruction is used to complete CAS function, and in SPARC-TSO, CASA instruction is also used to realize CAS function. In ARM and PowerPC architecture, a pair of LDREx/Strex instruction is needed to complete LL/SC function. The CAS instruction requires three operands, which are the memory location (which in Java is simply the memory address of A variable, represented by V), the old expected value (represented by A), and the new value (represented by B). When the CAS instruction executes, the handler updates the value of V with the new value B if and only if V matches the old expected value A, otherwise it does not perform the update, but returns the old value of V whether or not it has updated the value of V. The above processing is an atomic operation. CAS operations were only available in Java programs after JDK 1.5. The CAS operations are wrapped in methods like compareAndSwapInt () and compareAndSwapLong () in the sun.misc.Unsafe class, and the vm has special handling for these methods internally. The result of just-in-time compilation is a platform-dependent processor CAS instruction, with no method calls, or intrinsically inline. These virtual-processor-specific methods are called Intrinsics. Similar Intrinsics are math.sin ().

Since the Unsafe Class is not a Class to be called by a user program (the unsafe.getunsafe () code restricts access to it only to classes loaded by the Bootstrap ClassLoader), if the Unsafe Class is not reflected, It can only be used indirectly through other Java apis, such as the integer atom class in the J.U.C package, where methods such as compareAndSet () and getAndIncrement () use the CAS operations of the Unsafe class. The incrementAndGet () method keeps trying to assign itself a new value one more than the current value in an infinite loop. If this fails, the value has been modified during the acquisition-set operation, so the next operation is repeated until the set succeeds. As beautiful as CAS looks, it is clear that this operation does not cover all use scenarios for mutex synchronization, and CAS is not semantically perfect, with a logical loophole like this: If A variable V is A value when it is first read and is still A value when it is about to be assigned, can we say that its value has not been changed by another thread? If its value was changed to B during that time and then changed back to A, the CAS operation would assume that it had never been changed. This vulnerability is called the “ABA” problem for CAS operations. The J.U.C package addresses this problem by providing a tagged atom reference class “AtomicStampedReference” that guarantees the correctness of CAS by controlling the version of the variable value. In most cases, ABA problems do not affect the concurrency of programs. If ABA problems need to be solved, switching to traditional mutex synchronization may be more efficient than atomic classes.

3. No synchronization scheme (thread closed)

Synchronization is not necessary to be thread-safe; there is no cause-and-effect relationship. Synchronization is simply a means of ensuring the correctness of shared data contention times. If a method does not involve sharing data in the first place, it does not require any synchronization to ensure correctness, so some code is inherently thread-safe.

Reentrant Code: This Code, also called Pure Code, can interrupt Code at any point in its execution to execute another piece of Code (including recursive calls to itself) without any errors in the original program after control is returned. Reentrancy is a more fundamental feature than thread-safety. It ensures thread-safety, meaning that all reentrant code is thread-safe, but not all thread-safe code is reentrant. Reentrant code has some common characteristics, such as not relying on data stored on the heap and common system resources, using state quantities passed in from parameters, and not calling non-reentrant methods. We can determine whether code is reentrant by a simple rule: a method that returns the same result as the same input is predictable meets the reentrant requirement, which is, of course, thread-safe.

The stack is closed

Local variables are in the scope of a function and can only be accessed by the function itself, thus naturally ensuring thread-safety. This is called stack closure in thread closure.

public int Test(int count){
    int number = 0;
    for(int i = 0; i < count; i++){
        ++number;
    }
    return number;
}
Copy the code

 

Number in the example code above is a stack-closed variable, and because it is a local variable of the Test function, it is enclosed in a single thread without the need for other thread-safe keywords or methods. Java semantics naturally maintain its correctness.

Thread Local Storage: If the data needed in one piece of code must be shared with other code, can the code that shares the data be guaranteed to execute in the same Thread? If we can, we can limit the visibility of shared data to the same thread, so that synchronization is not required to ensure that there is no data contention between threads. This is not uncommon. Most architectural patterns that use consumption queues (such as the producer-consumer pattern) try to consume products in a single thread. One of the most important application examples is the thread-per-request processing in the classic Web interaction model. The widespread application of this processing method enables many Web server applications to use thread-local storage to solve the thread-safety problem.

In the Java language, if a variable is to be accessed by multiple threads, it can be declared “volatile” using the volatile keyword. If a variable is to be exclusive to a thread, there is no Java equivalent to the C++ __declspec (thread) keyword in Visual C++, whereas in GCC it is __thread. The java.lang.ThreadLocal class, however, can be used to implement thread-local storage. Each Thread has a ThreadLocalMap object in its Thread object. This object stores a set of k-V value pairs with threadLocal. ThreadLocal HashCode as keys and local Thread variables as values. A ThreadLocal object is an access point to the current thread’s ThreadLocalMap. Each ThreadLocal object contains a unique threadLocalHashCode value that can be used to retrieve the corresponding local thread variable in the thread k-V pair.