background

When we write double Check singletons, if we use static code checking tools such as PMD, we will report thread unsafe errors.

For example, we define a singleton class:

public final class SingleTest {

   private static SingleTest sSingleTest;

   private SingleTest(a) {}public static SingleTest getInstance(a) {
       if (sSingleTest == null) {
           synchronized (SingleTest.class) {
               if (sSingleTest == null) {
                   sSingleTest = newSingleTest(); }}}returnsSingleTest; }}Copy the code

Execute PMD task (how to execute PMD task by Google, online a lot), the output report is as follows:

<? The XML version = "1.0" encoding = "utf-8"? > < PMD XMLNS = "http://pmd.sourceforge.net/report/2.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://pmd.sourceforge.net/report/2.0.0 http://pmd.sourceforge.net/report_2_0_0.xsd" version = "6.8.0" Timestamp = "2021-08-21 T17:37:47. 921" > < file name="/Users/liuwei/AndroidStudioProjects/MyTestApplication/app/src/main/java/com/pplovely/mytestapplication/SingleTest. java"> <violation beginline="13" endline="19" begincolumn="9" endcolumn="9" rule="NonThreadSafeSingleton" ruleset="Multithreading" package="com.pplovely.mytestapplication" class="SingleTest" method="getInstance" ExternalInfoUrl = "https://pmd.github.io/pmd-6.8.0/pmd_rules_java_multithreading.html#nonthreadsafesingleton" priority="3"> Singleton is not thread safe </violation> </file> </pmd>Copy the code

Singleton is not thread safe.

What is the reason for this? Is it possible to ensure thread-safe operations through synchronous locking?

Use this article to briefly explain the concepts related to thread safety.

Thread based

Thread is the smallest unit of program execution flow. A standard thread consists of a thread ID, a current instruction pointer, a register set, and a stack. Typically, a process consists of one or more threads that share program memory (including code segments, data segments, heaps, etc.) and process-level resources (such as open files and signals).

A typical process and thread diagram looks like this:

Multiple threads can execute concurrently without interfering, sharing global variables of the process and heap data. So what are the advantages of multiple threads over single-threaded processes? In general, there are several reasons for using multithreading:

  • An operation consumes a lot of time (usually computation), and if there is only one thread, the interaction between the program and the user breaks down. Multithreading allows one thread to interact and another thread to compute.
  • The program logic itself requires concurrent operations. Like the store’s multi-download feature
  • Multi-cpu and multi-core devices inherently support the ability to execute multiple threads at the same time, so a single-threaded program cannot use the full computing power of a computer
  • Compared to multi-process applications, multi-threading is much more efficient in data sharing

Access permission of the thread

Threads also have their own private storage, including:

  • The stack
  • Thread Local Storage (TLS). Thread-local storage is the private space that some operating systems provide for threads individually, but usually with very limited capacity
  • Registers, which are the basic data of the execution flow and are therefore thread private

From a C programmer’s point of view, whether data is private between threads is shown in the following table:

Thread private Sharing between threads (process owned)
A local variable The global variable
Parameters of a function Data on the heap
The TLS data A static variable in a function
Program code, any thread has the right to read and execute any code
Files opened by thread A can be read and written by thread B

Scheduling and priority of threads

When the number of threads is less than or equal to the number of processors (including multi-core processors), thread concurrency is true concurrency, with different threads running on different processors without interfering with each other. However, when the number of threads is greater than the number of processors, the concurrency of threads is somewhat hindered, so at least one processor will run multiple threads.

Concurrency is a simulated state of affairs in the case of a single processor (assuming a single processor) corresponding to multiple threads. The operating system takes these multithreaded programs in turn, executing them for short periods of time (typically tens to hundreds of milliseconds) so that each thread “appears” to be executing simultaneously. This behavior of constantly switching between different threads on the processor is called Thread Schedule.

In thread scheduling, threads usually have at least three states, which are:

  • Running: The thread is executing
  • Ready: The thread can run immediately, but the CPU is already occupied.
  • Waiting: A thread that is Waiting for an event (usually I/O or synchronization) to occur and cannot execute

A running thread has a period of Time to execute, called a Time Slice. When the Time Slice is exhausted, the process enters the ready state. If a process starts waiting for an event before the time slice runs out, it enters the wait state. Whenever a thread leaves the running state, the scheduling system selects another ready thread to continue execution. A thread in the waiting state enters the ready state after the event for which it is waiting occurs. The transitions of these three states are shown below:

Different schemes and algorithms have been proposed for thread scheduling since the advent of multi-task operating system. Although the mainstream scheduling methods of lines are different, they all have traces of Priority Schedule and Round Robin. The so-called rotation method is the method mentioned above where each thread takes turns executing for a short period of time. This determines the nature of interleaving execution between threads. Priority scheduling determines the order in which threads are executed in turn. In a system with Priority scheduling, threads have their own Thread Priority. High-priority threads execute earlier, while low-priority threads often wait until there are no high-priority executable threads left in the system.

The priority of a thread can be manually set by the user, and the system will automatically adjust the priority according to the performance of different threads to make scheduling more efficient. For example, in general, threads that frequently enter the wait state (enter the wait state, giving up the portion of time that is still available later) (such as threads that process I/O) are more popular than threads that frequently perform so many calculations that they use up the time slice each time. Threads that wait frequently take up very little time. We refer to those that wait frequently as IO Bound threads and those that wait rarely as CPU Bound threads. IO – intensive threads are always easier to prioritise than CPU – intensive threads.

Starvation exists under priority scheduling. When a thread is starved to death, it has a low priority and is always executed by a higher priority thread before it executes, so that the thread never gets a chance to execute. When one CPU-intensive thread has a high priority, other low-priority threads can starve to death. A high-priority IO – intensive thread is less likely to starve to death because it spends most of its time in the wait state. To avoid starvation, scheduling systems often escalate the priority of unexecuted threads that have waited too long. In this way, if a thread waits long enough, its priority will rise enough to allow it to execute.

In the context of priority scheduling, there are three ways to change the priority of a thread:

  • The user specifies the priority
  • Increase or decrease the priority depending on how frequently you enter the wait state
  • Not being executed for a long time was prioritized

Multithreading in Linux

The Windows kernel has an explicit concept of threads and processes, and provides explicit apis: CreateProcess and CreateThread to create processes and threads, and a series of apis to manipulate them. But threads are not a universal concept for Linux.

In fact, threads don’t really exist in the Linux kernel. Linux refers to all execution entities (whether threads or processes) as tasks. Each Task is conceptually similar to a single-threaded process, with memory space, execution entities, file resources, and so on. However, under Linux, different tasks can choose to share memory space between different tasks. Therefore, in practice, multiple tasks that share the same memory space constitute a process, and these tasks become threads in the process. Under Linux, a new task can be created using the following methods:

The system calls role
fork Copy the current process
exec Overwrite the current executable image with a new executable image
clone Creates a child process and starts execution at the specified location

The fork function generates a new process identical to the current process and returns it from the fork as the current process does. Such as:

pid_t pid;
if (pid = fork()) 
{
    ……
}
Copy the code

After the fork call, the new task is started and returned from fork along with this task. The difference, however, is that the fork of this task will return a new PID, while the fork of the new task will return 0.

Fork generates new tasks very quickly, because fork does not Copy the original task’s memory space, but shares a Copy on Write (COW) memory space with the original task. The so-called copy-on-write refers to that two tasks can read the memory freely at the same time. However, when either task attempts to modify the memory, a copy of the memory will be provided to the modifier for independent use, so as not to affect the use of other tasks.

Fork can only produce a mirror image of the task, so you need to use exec to start new tasks. Exec can replace the current executable image with a new executable image, so after a fork produces a new task, the new task can call exec to execute the new executable file. Fork and exec produce a new task, while clone can be used to produce a new thread.

With Clone, you can create a new task, execute from a specified location, and (optionally) share the memory space, files, and so on of the current process. This will in effect produce a thread.

Thread safety

Multithreaded programs live in a fluid environment where accessible global and heap data can be changed by other threads at any time. Therefore, data consistency in concurrent multithreaded programs becomes very important.

Competition and atomic manipulation

Multiple threads accessing a shared data at the same time may cause serious consequences. Let’s start with the following example. Suppose there are two threads executing the following code:

Thread 1

i = 1;
++i;
Copy the code

Thread 2 * * * *

--i
Copy the code

On many architectures, ++ I is implemented as follows:

  1. Read I to some register X
  2. X++
  3. Store the contents of X back into I

Since thread 1 and thread 2 execute concurrently, the execution sequence of the two threads may be as follows (note that the contents of register X are different in different threads, where X1 and X2 represent X in thread 1 and thread 2 respectively).

Perform the serial number An instruction to The value of a variable after the statement is executed thread
1 i = 1 I = 1, the X1 = unknown 1
2 X1 = i i=1,X1=1 1
3 X2 = i i=1,X2=1 2
4 X1++ i=1,X1=2 1
5 X2– i=1,X2=0 2
6 i = X1 i=2,X1=2 1
7 i = X2 i=0,X2=0 2

The program logic says that the value of I should be 1 after the two threads have finished executing, but you can see from the previous execution sequence that the value of I is 0. In fact, if both threads are executed at the same time, the result of I could be 0 or 1 or 2. As you can see, two programs reading and writing the same shared data at the same time can lead to unexpected consequences.

Obviously, the increment (++) operation will fail in a multithreaded environment because the operation is compiled into assembly code after more than one instruction, so it can be interrupted halfway through execution by the scheduling system to execute other code. We call the operation of a single instruction Atomic, because the execution of a single instruction can never be interrupted anyway.

In complex situations, such as when we want to ensure atomicity of changes to a complex data structure, we use locks

Synchronization with the lock

To avoid the unpredictable consequences of multiple threads reading the same data at the same time, we Syncchronization of threads’ access to the same data. Synchronization means that when one thread is accessing data, other threads cannot access the same data. Thus, access to the data is atomized.

The most common method of synchronization is to use locks. A noncoercive mechanism in which each thread first attempts to Acquire a lock before accessing data or resources and then releases the lock after access. When an attempt is made to acquire the lock while it is already occupied, the thread waits until the lock is available again.

The simplest type of lock, Binary Semaphore, has only two states: occupied and unoccupied. It is suitable for resources that can only be accessed exclusively by a single thread. When a binary semaphore is in an unoccupied state, the first thread attempting to acquire the binary semaphore acquires the lock and sets the binary semaphore to occupied state, after which all other threads attempting to acquire the binary semaphore will wait until the lock is released.

A Semaphore is a good choice for resources that allow concurrent access by multiple threads. A Semaphore with an initial value of N allows concurrent access by N threads.

When a thread accesses a resource, it first obtains a semaphore and performs the following operations:

  • Decrement the semaphore value by 1.
  • If the semaphore value is less than 0, the wait state is entered, otherwise execution continues

After the resource is accessed, the thread releases the semaphore and does the following:

  • Add the semaphore +1.
  • If the semaphore value is less than 1, wake up a waiting thread.

A Mutex is similar to a binary semaphore in that a resource can only be accessed by one thread at a time, but unlike a semaphore, a semaphore can be acquired and released by any thread in the entire system. That is, the same semaphore can be acquired by one thread in the system and then released by another thread. Mutexes require that the thread that acquired the mutex is responsible for releasing the lock. It is not valid for other threads to release the mutex.

Critical sections are a more rigorous synchronization method than mutex. In terminology, the acquisition of a lock in a critical region is called entering a critical region, and the release of a lock is called leaving a critical region. The difference between a critical section and a mutex and a semaphore is that the scope of the former is limited to the process itself and cannot be accessed by other processes, while the two mutex and a semaphore are accessible to multiple processes. For example, if one process creates a mutex, it is legal for another process to try to acquire the lock.

Read Write Lock is more applicable to I/O operations. Allowing multiple threads to read simultaneously is fine, but when it comes to writing, it must be synchronous. Read/write locks can be acquired in two ways: Shared or Exclusive.

The relationship between read/write lock status and acquisition mode is as follows:

Read/write Lock status Obtain it in shared mode Obtain in exclusive mode
free successful successful
Shared successful Waiting for the
exclusive Waiting for the Waiting for the

As a means of synchronization, Condition Variable acts like a fence. A thread can wait on a condition variable. A condition variable can be waited on by multiple threads. Second, a thread can wake up a condition variable, at which point one or all threads waiting for a subcondition variable will wake up and continue to support it. That is, using condition variables allows many threads to wait together for an event to occur, and when the event occurs (condition variables wake up), all threads can resume execution together.

Reentrant and thread safety

A function is reentrant, indicating that the function has not completed its execution, but has been re-entered due to external factors or internal calls. There are only two ways a function can be reentrant:

  • Multiple threads execute this function simultaneously
  • The function itself calls itself (possibly after multiple calls)

A function is called reentrant, indicating that it can be reentrant without any adverse consequences. For example, the following SQR function is reentrant:

int sqr(int x) {
    return x * x;
}
Copy the code

For a function to be reentrant, it must have the following characteristics:

  • Do not use any (local) static variables or global nonconst variables.
  • Does not return a pointer to any (locally) static or global nonconst variable.
  • Rely only on the arguments provided by the caller.
  • Locks that do not depend on any qualification resources (mutex, etc.)
  • Do not call any non-reentrant functions.

Reentrant is a strong guarantee of concurrency safety, and a reentrant function can be used safely in a multithreaded environment.

Excessive optimization

Thread safety is a hot potato, because even if locks are properly used, thread safety is not guaranteed, because outdated compiler technology has been unable to meet the increasing demands of concurrency. A lot of seemingly faultless code gets in the way of optimization and concurrency. Let’s take a simple example

X = 0;
Thread1          Thread2
lock();          lock();
x++;             x++;
unlock();        unlock();
Copy the code

Because of lock and unlock protection, x++ behavior is not broken by concurrency, so x must be equal to 2. However, if the compiler places X in a register to speed up access to x, we know that the registers of different threads are independent, so if Thread1 acquires the lock first, the program execution might look like this:

  • [Thread1] read x into a register R[1] (R[1] = 0)
  • [Thread1] R[1]++ (Thread1 will not write R[1] back to x because it may need to access x later)
  • [Thread2] read x into a register R[2] (R[2] = 0)
  • 【Thread2】R[2]++ (R[2] = 1)
  • [Thread2] write R[2] back to x (x =1)
  • [Thread1] (much later) write R[1] back to x (x=1)

It can be seen that in this case, even if the correct lock, can not guarantee the safety of multithreading. Here’s another example:

x = y = 0;
Thread1          Thread2
x=1                  y=1;
r1=y                 r2=x
Copy the code

Normally, at least one of R1 and R2 is 1. Logically, both r1 and R2 cannot be 0. However, in fact, r1= R2 =0 could happen. The reason is that cpus developed dynamic scheduling decades ago, where it was possible to swap instruction orders for efficiency during program execution. It is also possible for the compiler to switch the execution order of two unrelated adjacent instructions (such as x=1 and R1 =y) for efficiency.

x = y = 0;
Thread1          Thread2
r1=y;                y=1;
x=1;                 r2=x
Copy the code

So r1 is equal to R2 is equal to 0. We can use the volatile keyword to try to prevent excessive optimization. Volatile basically does two things:

  • Prevents the compiler from caching a variable into a register without writing it back in order to improve speed
  • Prevents the compiler from adjusting the order of instructions to operate on volatile variables

So volatile solves the first problem perfectly, but does valatile solve the second problem as well? The answer is no. Because even though volatile prevents the compiler from reordering, it does not prevent the CPU from dynamically scheduling reordering.

Now let’s look at the singleton double check problem mentioned at the beginning.

volatile T* pInstance = 0; T* getInstance() { if (pInstance == NULL) { lock(); if (pInstance == NULL) { pInstance = new T; } unlock(); }}Copy the code

Logic aside, such code looks fine at first glance. When the function returns, pInstance always points to a valid object. Lock and UNLOCK prevent trouble with multithreaded contention. The double if here minimizes the overhead of the lock call.

But in fact such code is problematic. The source of the problem is still CPU out-of-order execution. New an object actually involves two steps:

  • Allocate memory
  • Calling the constructor

So pInstance = new T involves 3 steps

  1. Allocate memory.
  2. Call the constructor at the location of memory.
  3. Assign a memory address to pInstance.

In these three steps, the order of 2 and 3 can be reversed. That is, it is entirely possible that the pInstance value is no longer NULL, but the object is still not constructed. If a concurrent call to getInstance occurs, the first if condition is not satisfied, so an uninitialized object is returned. Whether or not the program will crash depends on the implementation of the code.

From the above two examples, we can see that the out-of-order execution capability of the CPU makes our efforts to secure multithreading extremely difficult. Therefore, to ensure thread safety, it is necessary to prevent CPU switching. This is typically done by invoking an instruction provided by the CPU, known as the barrier instruction. A Barrier instruction prevents the CPU from swapping instructions that precede it to those that follow it. This is the same as the barrier directive that Android sets to the main thread Handler when drawing.

Of course, in Java, we can build a singleton as a static inner class, which is most efficient and thread-safe, without listing the code here.

Three threading models

One-to-one model

For systems that support threading directly, the one-to-one model is always the simplest. A thread used by a user corresponds to a thread used by a kernel (but not necessarily the other way around, as a thread in a kernel does not necessarily have a corresponding thread in user mode), as shown below:

This allows user threads to be optimized in line with kernel threads, and the concurrency between threads is true concurrency. When one thread blocks for other reasons, the execution of other threads is not affected. The 1-to-1 model has better performance on multiple processors. Half of the threads created directly using kernel apis or system calls have a 1-to-1 model. The one-to-one model has two disadvantages:

  • Since many operating systems limit the number of kernel threads, one-to-one threading limits the number of threads a user can have
  • In many operating system kernel thread scheduling, the overhead of context switch is high, which leads to the execution efficiency of user thread

Many-to-one model

The many-to-one model maps multiple user threads to a kernel thread, and the switching between threads is carried out by the user-mode code. Therefore, compared with the one-to-one model, the switching speed of threads in this model is much faster, as shown in the figure below:

The biggest problem with the many-to-one model is that if one of the user threads is blocked, all the threads cannot execute because the threads in the kernel are blocked.

The biggest advantages of the many-to-one model are efficient context switching and an almost unlimited number of threads

Many-to-many model

The many-to-many model combines the characteristics of the many-to-one and one-to-one models to map multiple user threads to a small but unknown kernel thread, as shown below:

In the many-to-many model, one user thread blocking does not cause all user threads to block, because there are other threads that can be scheduled to execute. In addition, the many-to-many model imposes no limit on the number of user threads.

This article is published by OpenWrite!