Java memory model and threads

Let’s learn some questions:

The memory model
The principle of volatile
Relationship between working memory and main memory
Atomicity, visibility and order
Why is kernel thread scheduling more expensive to switch?
How are threads implemented in the Java language?
What is the principle of antecedent?

An overview,

“Efficient Concurrency” will introduce virtual machines to multithreading, and the problems and solutions that arise from sharing and competing data.

2. Hardware efficiency and consistency

An important source of complexity is that most computing tasks cannot be “computed” by the processor alone. The processor must at least interact with memory, such as reading data, storing results, and so on, and this I/O operation is very difficult to eliminate (you can’t just rely on registers to do all the computation). Because of the difference of several orders of magnitude between the processing speed of the computer’s storage device and that of the processor, modern computer systems have to buffer between memory and the processor by adding one or more layers of caches, which can read and write data as fast as possible. Copying the data needed for an operation to the cache allows the operation to proceed quickly and then synchronizing it back from the cache when the operation is complete so that the processor does not have to wait for slow memory reads and writes.

When the computation tasks of multiple processors all involve the same main memory area, the cache data of each processor may be inconsistent. If this happens, whose cached data should be used when synchronizing back to main memory? To solve the problem of consistency, it is necessary for each processor to follow some protocols when accessing the cache, and to operate according to the protocols when reading and writing. Such protocols include MSI, MESI (Illinois Protocol), MOSI, Synapse, Firefly, and Dragon Protocol. Beginning in this chapter, we will see a lot of the term “memory model,” which can be understood as a process abstraction of read and write access to a particular memory or cache under a particular operating protocol. Physical machines with different architectures can have different memory models, and Java virtual machines have their own memory models that are highly analogous to the memory access operations described here and the cache access operations of hardware. Figure 12-1 Interaction among processors, caches, and main memory

In addition to increasing the cache, the processor may optimize the out-of-order Execution Of input code in Order to maximize the number Of units within the processor. The processor reorganizes the out-of-order Execution Of input code after the computation to ensure that the result is consistent with the sequential Execution. However, it is not guaranteed that the sequence of each statement in the program is the same as that in the input code. Therefore, if there is a calculation task that depends on the intermediate results of another calculation task, the sequence cannot be guaranteed by the sequence of the code. Similar to out-of-order execution optimizations for processors, Instruction Reorder optimizations are available in the Just-in-time compiler for the Java virtual machine.

Java memory model

1. Main memory vs. working memory

The main purpose of the Java memory model is to define access rules for various variables in a program, focusing on the low-level details of storing variable values into and out of memory in the virtual machine. Variables include instance fields, static fields, and elements that form array objects, but exclude local Variables and method parameters, which are thread private and will not be shared, so there will be no competition problems.

Note the distinction here: if a local variable is a Reference type, the object it refers to is shared among threads in the Java heap, but the reference itself is thread private in the local variable table of the Java stack.

The working memory of a thread holds the main memory copy of variables used by the thread. All operations (reads, assignments, etc.) on variables must be performed by the thread in the working memory, instead of reading or writing data directly from the main memory. Different threads cannot directly access variables in each other’s working memory. Variable values between threads need to be transferred through the main memory. Figure 12-2 shows the interaction among threads, main memory, and working memory.

“If a thread accesses a 10MB object, will it make a copy of that 10MB?” In fact, this is not the case. The reference to the object, a field in the object that is accessed by a thread, may be copied, but no virtual machine will copy the entire object at once.

According to the Java Virtual Machine Specification, volatile variables still have a copy of working memory, but because of their particular order of operations (described later), they appear to be read and written directly into main memory, so there are no exceptions to the description here for volatile.

Figure 12-2 Interaction among threads, main memory, and working memory (see Figure 12-1 for comparison)

2. Interactive operation between memory

The Java memory model defines the following eight operations for the specific protocol of interaction between main memory and working memory, that is, how a variable is copied from main memory to working memory and synchronized from working memory back to main memory. Into. Java virtual machine implementations must ensure that each of the following operations is atomic and non-divisible. (For double and long variables, exceptions are allowed for load, Store, read, and write on some platforms.

Lock: A variable acting on main memory that identifies a variable as a thread-exclusive state.
Unlock: A variable that acts on main memory. It releases a locked variable so that it can be locked by another thread.
Read: A variable acting on main memory that transfers the value of a variable from main memory to the thread’s working memory for subsequent load action.
Load: Variable acting on working memory, which puts the value of the variable obtained from main memory by the read operation into a copy of the variable in working memory.
Use: variable applied to working memory, which passes the value of a variable in working memory to the execution engine. This operation is performed whenever the virtual machine reaches a bytecode instruction that needs to use the value of the variable.
Assign: a working memory variable that assigns a value received from the execution engine to the working memory variable. This operation is performed whenever the virtual machine accesses a bytecode instruction that assigns a value to the variable.
Store: Variable applied to working memory that transfers the value of a variable in working memory to main memory for subsequent write operations.
Write: a variable operating on main memory that places the value of a variable in main memory obtained from the store operation in working memory.

The Java memory model also specifies that the following rules must be met when performing the eight basic operations described above:

One of the read and load, store and write operations is not allowed to occur separately, that is, a variable is read from main memory but not accepted by working memory, or the working memory writes back but not accepted by main memory.
A thread is not allowed to discard its most recent assign operation, which means that after a variable has changed in working memory, it must synchronize the change back to main memory.
A thread is not allowed to synchronize data from the thread’s working memory back to main memory without any assign operation.
A new variable can only be created in the main memory. It is not allowed to use an uninitialized variable in the working memory. In other words, the use and store operations on a variable must be performed before the use and store operations are performed on the variable.
A variable can be locked by only one thread at a time. However, the lock operation can be repeated by the same thread several times. After the lock operation is performed several times, the variable can be unlocked only after the same number of UNLOCK operations are performed.
If you perform a lock operation on a variable, the value of the variable will be emptied from working memory. Before the execution engine can use the variable, load or assign operations need to be performed again to initialize the value of the variable.
It is not allowed to unlock a variable that has not been previously locked by a lock operation, nor to unlock a variable that has been locked by another thread.
Before an unlock operation can be performed on a variable, the variable must be synchronized back to main memory (store, write).

Special rules for volatile variables

When a variable is volatile, it has two properties. The first is to ensure that the variable is visible to all threads, meaning that when one thread changes the value of the variable, the new value is immediately known to other threads. This is not the case with ordinary variables, whose values are passed from thread to thread through main memory. For example, if thread A modifies the value of A common variable and writes back to main memory, thread B will read back to main memory after thread A has written back, and the value of the new variable will be visible to thread B.

Listing 12-1 The operation for volatile

/** * Volatile variable increment test **@author zzm
 */
public class VolatileTest {

    public static volatile int race = 0;

    public static void increase(a) {
        race++;
    }

    private static final int THREADS_COUNT = 20;

    public static void main(String[] args) {
        Thread[] threads = new Thread[THREADS_COUNT];
        for (int i = 0; i < THREADS_COUNT; i++) {
            threads[i] = new Thread(new Runnable() {
                @Override
                public void run(a) {
                    for (int i = 0; i < 10000; i++) { increase(); }}}); threads[i].start(); }// Wait for all accumulated threads to finish
        while (Thread.activeCount() > 1) Thread.yield(); System.out.println(race); }}Copy the code

At the bytecode level, it is easy to analyze the cause of concurrency failure: The volatile ensures that race is correct when the getStatic command takes race to the top of the stack, but by the time iconST_1 and iADD are executed, race may have been changed by other threads and the top of the stack becomes stale. So it is possible to synchronize smaller RACE values back into main memory after the putStatic instruction is executed.

Listing 12-2 The bytecode of VolatileTest

public static void increase(a);
    Code:
        Stack=2, Locals=0, Args_size=0
        0:   getstatic       #13; //Field race:I
        3:   iconst_1
        4:   iadd
        5:   putstatic       #13; //Field race:I
        8:   return
    LineNumberTable:
        line 14: 0
        line 15: 8
Copy the code

To be honest, using bytecode to analyze concurrency problems is still not rigorous, because even if only one bytecode instruction is compiled, it does not mean that executing that instruction is an atomic operation. When a bytecode instruction is interpreted, the interpreter has to run many lines of code to realize its semantics. A bytecode instruction may also be converted into several local machine code instructions if compiled. It would be more rigorous to use the -xx: +PrintAssembly argument to disassemble the analysis, but for the convenience of the reader and the fact that bytecodes already speak for themselves, bytecodes are used here.

Since volatile variables only guarantee visibility, atomicity is still guaranteed by locking (using locks in synchronized, java.util.concurrent, or atomic classes) in operations that do not comply with the following two rules:

The result of the operation does not depend on the current value of the variable, or can ensure that only a single thread changes the value of the variable.
Variables do not need to participate in invariant constraints with other state variables.

Using volatile variables to control concurrency is appropriate in scenarios like the one shown in Listing 12-3. When shutdown() is called, the doWork() method in all threads is stopped immediately.

Listing 12-3 Usage scenarios for volatile

volatile boolean shutdownRequested;

public void shutdown(a) {
    shutdownRequested = true;
}

public void doWork(a) {
    while(! shutdownRequested) {// The business logic of the code}}Copy the code

The second semantics of using volatile variables is to prohibit instruction reordering optimizations. Ordinary variables only guarantee that the correct result will be obtained at any point during the execution of the method that depends on the assignment, not that the assignment will be performed in the same order as in the program code.

Let’s continue with an example to see why instruction reordering interferes with concurrent execution. The demo is shown in Listing 12-4.

Listing 12-4 instruction reorder

Map configOptions;
char[] configText;
// This variable must be volatile
volatile boolean initialized = false;

// Suppose the following code is executed in thread A
// Emulation reads configuration information when the reading is complete
Initialized is set to true to notify other threads that the configuration is available
configOptions = new HashMap();
configText = readConfigFile(fileName);
processConfigOptions(configText, configOptions);
initialized = true;


// Suppose the following code is executed in thread B
If the value of initialized is true, thread A has initialized the configuration information
while(! initialized) { sleep(); }// Use the configuration information initialized in thread A
doSomethingWithConfig();
Copy the code

If the initialized variable is defined without A volatile modifier, the last code in thread A, “Initialized =true”, may be executed prematurely due to instruction reordering optimization (although Java is used as pseudocode here). The reordering optimization is machine-level optimization, and preexecution means that the assembly code for this statement is preexecuted), so that code that uses the configuration information in thread B can fail. The volatile keyword prevents this from happening.

How the volatile keyword disallows instruction reorder optimization. Listing 12-5 shows a standard Double Check Lock (DCL) singleton [inset] code that looks at the difference between assembly code generated with and without volatile (how do I get assembly code compiled in real time? See chapter 4 about the HSDIS plug-in). Listing 12-5 DCL singleton pattern

public class Singleton {

    private volatile static Singleton instance;

    public static Singleton getInstance(a) {
        if (instance == null) {
            synchronized (Singleton.class) {
                if (instance == null) {
                    instance = newSingleton(); }}}return instance;
    }

    public static void main(String[] args) { Singleton.getInstance(); }}Copy the code

When compiled, the code assigns a value to the instance variable as shown in Listing 12-6. Listing 12-6 assigns a value to the instance variable

0x01a3de0f: mov    $0x3375cdb0,%esi     ;...beb0cd75 33
                                        ;   {oop('Singleton')}
0x01a3de14: mov    %eax,0x150(%esi)     ;..89865001. 0000
0x01a3de1a: shr    $0x9,%esi            ;...c1ee09
0x01a3de1d: movb   $0x0.0x1104800(%esi) ;...c6860048 100100
0x01a3de24: lock addl $0x0,(%esp)       ;...f0830424 00; *putstatic instance ; - Singleton::getInstance@24
Copy the code

Mov %eax, 0x150(%esi); mov%eax, 0x150(% ESI); mov%eax, 0x150(% ESI); This operation acts as a Memory Barrier or Memory Fence, which means that reordering does not reorder subsequent instructions to the place before the Fence.

This instruction “addL $0x0, (%esp)” (increment the value of the ESP register by 0) is clearly a null operation. This null operation is used instead of the noP instruction because the IA32 manual does not allow the lock prefix to be used with NOP instructions. The key here is the lock prefix, which, according to IA32 manuals, writes the local processor’s cache to memory. This write also invalidates the cache of another processor or kernel. This operation is equivalent to performing the “store and write” operations on variables in the cache described earlier in the Java memory pattern. This null operation makes the previous changes to the volatile variable immediately visible to other processors.

Volatile reads have the same performance cost as normal variables, but writes can be slower because they require many memory-barrier instructions to be inserted into the native code to keep the processor from executing out of order. The overall cost of volatile is still lower than that of locking in most scenarios. The only basis for our choice between volatile and locking is whether the semantics of volatile meet the requirements of the usage scenario.

Let’s go back to the Java memory model for the definition of special rules for volatile variables. Given that T represents a thread and V and W represent volatile variables, read, Load, use, assign, Store, and write operations must comply with the following rules:

Thread T can perform use on V only if its previous action on V was load. Also, thread T can load variable V only if the next action performed by thread T on variable V is use. The use action of thread T on variable V can be considered to be associated with the load and read actions of thread T on variable V and must occur consecutively and together.

This rule requires that each time V is used in working memory, the most recent value must be flushed from main memory to ensure that changes made to V by other threads are visible.
T can execute the store action on V only if its previous action on V is assign. In addition, thread T can assign to variable V only if the next action performed by thread T on variable V is store. The assign action of thread T to variable V can be considered to be associated with the Store and write actions of thread T to variable V, and must occur consecutively and together.

This rule requires that every change to V in working memory must be immediately synchronized back to main memory to ensure that other threads can see their changes to V.
Assume that action A is A use or assign action applied by thread T to variable V, that action F is A load or store action associated with action A, and that action P is A read or write action on variable V corresponding to action F. Similarly, assume that action B is a use or assign action applied by thread T to variable W, that action G is a load or store action associated with action B, and that action Q is a read or write action corresponding to action G on variable W. If A comes before B, then P comes before Q. This rule requires that volatile variables are not optimized by instruction reordering to ensure that code is executed in the same order as the program.

Special rules for long and double variables

The Java memory model requires all eight operations to be atomic: Lock, unlock, read, Load, assign, Use, Store, and write. However, for 64-bit data types (long and double), the model defines a loose rule: Allow the VIRTUAL machine to divide reads and writes of 64-bit data that are not volatile into two 32-bit operations. That is, allow the VIRTUAL machine to choose whether to atomize the load, store, read, and write operations of 64-bit data types. This is known as the “non-atomic Treatment of double and long Variables.”

In practice, we generally do not need to declare long and double volatile specifically for this reason when writing code, unless the data has an explicitly known thread contention.

5. Atomicity, visibility and order

After covering the operations and rules of the Java memory model, let’s review the characteristics of the model as a whole. The Java memory model is built around how atomicity, visibility, and orderliness are handled during concurrency, and let’s take a look at which operations implement each of these three characteristics.

1. Atomicity

The atomic variable operations directly guaranteed by the Java memory model include read, load, assign, use, Store, and write. We can generally assume that basic data types can be accessed and read atomically (except for the non-atomic conventions of long and double). Just know about it and don’t pay too much attention to the exceptions that rarely happen. The Java memory model also provides for lock and UNLOCK operations if the application scenario requires a wider range of atomicity guarantees (as is often the case), although the virtual machine does not make lock and UNLOCK operations available directly to the user. However, higher-level bytecode instructions Monitorenter and Monitorexit are provided to implicitly use these two operations. These two bytecode instructions are reflected in Java code as synchronized blocks — the synchronized keyword, and hence the atomicity of operations between synchronized blocks. 民运分子

2. Visibility

Visibility means that when one thread changes the value of a shared variable, other threads are immediately aware of the change. We discussed this in detail earlier in the tutorial on volatile variables. The Java memory model provides visibility by relying on main memory as a transfer medium by synchronizing the new value back to main memory after the variable is modified and flushing the value from main memory before the variable is read, both for common and volatile variables. Normal variables differ from volatile variables in that the special rules of volatile ensure that new values are immediately synchronized to main memory and flushed from main memory immediately before each use. Thus we can say that volatile guarantees visibility of variables in multithreaded operations, whereas normal variables do not. In addition to volatile, Java has two other keywords for visibility: synchronized and final. ** The visibility of synchronized blocks is obtained by the rule that a variable must be synchronized back to main memory (store, write) before unlock is performed. ** Visibility of the final keyword means: Once a field modified by final is initialized in the constructor and the constructor does not pass a reference to “this” (this reference escape is a dangerous thing because other threads may access the “half-initialized” object through this reference), the value of the final field is visible in other threads. As shown in Listing 12-7, the variables I and j are both visible and can be accessed correctly by other threads without synchronization.

Listing 12-7 Final and visibility

public static final int i;

public final int j;

static {
    i = 0;
    // omit subsequent actions
}

{
    // You can also choose to initialize it in the constructor
    j = 0;
    // omit subsequent actions
}
Copy the code

3. Orderliness

The orderliness of the Java memory model was discussed in some detail in the previous section on volatile. The natural orderliness of Java programs can be summed up in the following sentence: if viewed within the thread, all operations are ordered; If you observe another thread in one thread, all operations are out of order. The Semantics of the in-thread as-if-serial Semantics are defined in terms of the Semantics of in-thread as-if-serial Semantics, and the Semantics of in-thread as-if-serial Semantics are defined in terms of the Semantics of in-thread as-if-serial Semantics, and the Semantics of in-thread as-if-serial Semantics are defined in terms of the Semantics of in-thread as-if-serial Semantics.

The Java language provides the keywords volatile and synchronized to ensure the order of operations between threads. Volatile contains semantics that forbid instruction reordering. Synchronized is acquired by the rule that a variable can only be locked by one thread at a time, which determines that two synchronized blocks holding the same lock can enter only serially. After introducing three important features of concurrency, do you find that synchronized keyword can be used as a solution when these three features are needed? Looks pretty versatile, doesn’t it? Indeed, the vast majority of concurrency control operations can be done using synchronized. Synchronized’s versatility also indirectly contributes to its abuse by programmers, and the more versatile concurrency control is, the greater the performance impact is generally associated with it, which we’ll discuss in the next chapter on virtual machine lock optimization.

6. Principle of antecedent

Now let’s see what the “prior occurrence” principle means. First occurrence is defined in the Java memory model of partial order relation between two operations, such as operating A first occurred in operation B, that is actually happening B before operation, operation impact can be operating B observed, “” including modify the Shared memory variable values, sent A message, and call the method.

Listing 12-8 Example of the first occurrence principle 1

// The following operations are performed in thread A
i = 1;

// The following operations are performed in thread B
j = i;

// The following operations are performed in thread C
i = 2;
Copy the code

Assuming that the operation “I =1” in thread A precedes the operation “j= I” in thread B, we can determine that the value of variable j must be equal to 1 after the operation of thread B is executed. This conclusion can be drawn based on two aspects: first, the result of “I =1” can be observed according to the principle of ante-occurrence; Second, thread C has not yet appeared, and no other thread will modify the value of variable I after thread A finishes operation. Now let’s consider thread C, we still have the antecedent relationship between thread A and thread B, and C appears between the operations of thread A and thread B, but C has no antecedent relationship with thread B, so what’s the value of j going to be? The answer is no! Both 1 and 2 are possible, because thread C’s effect on variable I may or may not be observed by thread B, and thread B is at risk of reading expired data, which is not multithreaded safe.

Below are some “natural” antecedents under the Java memory model that already exist without the assistance of any synchronizer and can be used directly in coding.

Program Order Rule

In a thread, operations written earlier take place before operations written later, in control flow order. Note that we are talking about the control flow sequence, not the program code sequence, because we have branches, loops, and so on to consider.

Monitor Lock Rule

An UNLOCK operation occurs first after a lock operation on the same lock. What must be emphasized here is “the same lock”, and “behind” refers to the sequence of time.

Volatile Variable Rules

A write to a volatile variable occurs first after a read, again in chronological order.

Thread Start Rule

The start() method of the Thread object precedes every action of the Thread.

Thread Termination Rule

All operations in a Thread occur before the Thread terminates. We can check whether the Thread:: Join () method ends or whether the Thread::isAlive() method returns the value.

The Thread Interruption Rule

A call to the threadinterrupt () method occurs when code in the interrupted Thread detects that an interrupt event has occurred. Thread::interrupted() is used to detect whether an interrupt has occurred.

Object Finalizer Rule

The completion of an object’s initialization (the end of constructor execution) occurs first at the beginning of its Finalize () method.

Transitivity

If operation A precedes operation B and operation B precedes operation C, it follows that operation A precedes operation C.

There are and only a few antecedent rules in the Java language that do not require any synchronization guarantees. Here’s how you can use these rules to determine whether operations are sequential or thread-safe for reading or writing shared variables. You can also see the difference between chronological and antecedent in the following example. A demonstration example is shown in Listing 12-9.

Listing 12-9 Example of the first occurrence principle 2

private int value = 0;

pubilc void setValue(int value){
    this.value = value;
}

public int getValue(a){
    return value;
}
Copy the code

Listing 12-9 shows A perfectly common set of getter/setter methods. Suppose there are threads A and B, thread A calls setValue(1) first and thread B calls getValue() on the same object. What is the return value received by thread B? Let’s examine the rules of the antecedent principle in turn. Since the two methods are called by threads A and B, respectively, and are not in the same thread, the program order rules do not apply here; Because there is no synchronization block, lock and unlock operations do not occur naturally, so the pipe locking rule does not apply. Because the value variable is not decorated by the volatile keyword, the volatile variable rules do not apply; The thread start, terminate, interrupt, and object termination rules that follow are also completely irrelevant. Since there is no applicable antecedent rule, the last transitivity is also out of the question, so we can determine that although thread A is ahead of thread B in the operation time, we cannot determine the return result of the getValue() method in thread B, in other words, the operation is not thread-safe.

From the above example, we can conclude that just because an operation is “temporally prior” does not mean that the operation will be “antecedent”. If an operation is “antecedent”, does it follow that the operation must be “antecedent”? Unfortunately, that’s not true either. A typical example is the oft-mentioned “instruction reorder,” which is illustrated in Listing 12-10.

Listing 12-10 Example of the first occurrence principle 3

// The following operations are performed in the same thread
int i = 1;
int j = 2;
Copy the code

The two assignment statements shown in Listing 12-10 are in the same thread. According to the program order rule, the operation “int I =1” takes place first in “int j=2”, but the code “int j=2” may be executed first by the processor, which does not affect the correctness of the principle of first occurrence. Because we have no way of perceiving this in this thread. The above two examples together prove a conclusion: there is basically no causal relationship between the time order and the antecedent principle, so we should not be disturbed by the time order when we measure the concurrency safety problem, everything must be based on the antecedent principle.

Java and threads

1. Implementation of threads

As we know, thread is a more lightweight scheduling execution unit than process. The introduction of thread can separate the resource allocation and execution scheduling of a process. Each thread can share process resources (memory address, file I/O, etc.), and can schedule independently. Threads are currently the most basic unit of processor resource scheduling in Java, but that may change if the Loom project succeeds in bringing Fiber to Java. There are three main ways to implement threads: kernel threads (1:1 implementation), user threads (1: N implementation), and a mixture of user threads and lightweight processes (N: M implementation).

1. Kernel thread implementation

The implementation using kernel threads is also known as a 1:1 implementation. Kernel-level threads (KLT) are threads directly supported by the operating system Kernel. These threads are switched by the Kernel, which manipulates the Scheduler to schedule the threads. And is responsible for mapping the tasks of the thread to the individual processors. Each Kernel thread can be regarded as a doppelgant of the Kernel, so that the operating system can handle more than one thing at a time. A Kernel that supports multithreading is called a multi-threads Kernel. Generally, programs do not directly use kernel threads, but use a high-level interface of kernel threads — light-weight Process (LWP). Lightweight processes are generally referred to as threads. Since each lightweight Process is supported by a kernel thread, kernel threads must be supported first. To have lightweight processes. This 1:1 relationship between lightweight processes and kernel threads is called the one-to-one threading model, as shown in Figure 12-3.

Figure 12-3 1:1 relationship between lightweight processes and kernel threads

Because of kernel thread support, each lightweight process becomes an independent scheduling unit, and even if one of the lightweight processes is blocked in a system call, the whole process does not continue to work. Lightweight processes also have their limitations: first, since they are implemented based on kernel threads, threaded operations such as creation, destruction, and synchronization require system calls. The cost of system call is relatively high, and it needs to switch back and forth between User Mode and Kernel Mode. Second, each lightweight process needs to be supported by a kernel thread, so ** lightweight processes consume a certain amount of kernel resources (such as kernel thread stack space), so the number of lightweight processes supported by a system is limited. 民运分子

2. Implementation of user threads

The approach of using the user thread implementation is known as the 1: N implementation. Broadly speaking, a Thread as long as it’s not a kernel Thread, can be considered a User Thread a User Thread, (UT), so look from this definition, a lightweight process also belong to the User Thread, but the realization of lightweight process is always based on the kernel, should undertake many operating system calls, so efficiency could be limited, Does not have the usual benefits of user threads.

Figure 12-4 1: N relationship between processes and user threads

In the narrow sense, user thread refers to the thread library which is completely established in user space, and the system kernel cannot perceive the existence of user thread and how to achieve it. The creation, synchronization, destruction, and scheduling of user threads are complete in user mode without the help of the kernel. If implemented properly, this thread does not need to be switched to kernel mode, so operations can be very fast and low cost, and can support a larger number of threads. Some of the multi-threading in high performance databases is implemented by user threads. This 1: N relationship between processes and user threads is called the one-to-many threading model, as shown in Figure 12-4.

** The advantage of user threads is that they do not need kernel support, and the disadvantage is that there is no kernel support, all thread operations need to be handled by the user program itself. * * thread creation, destruction, switch and scheduling are users must consider the question, and because the operating system, only the processor resource allocation to the process that such as “obstruction how to deal with the” “multiprocessor systems how to map the thread to other processors” kind of problem solving that is going to be extremely difficult, even some of them are not possible. However, in recent years, many new programming languages with high concurrency as the selling point support user threads, such as Golang, Erlang and so on, making the use of user threads rebound.

3. Hybrid implementation

In addition to relying on the kernel thread implementation and completely implemented by the user program itself, there is another implementation that uses the kernel thread and user thread together, which is called N: M implementation. Under this hybrid implementation, there are both user threads and lightweight processes. User threads are still built entirely in user space, so user threads are still cheap to create, switch, and destruct, and can support large-scale user thread concurrency. The lightweight process supported by the operating system acts as a bridge between the user thread and the kernel thread, so that the thread scheduling function and processor mapping provided by the kernel can be used, and the system call of the user thread is completed through the lightweight process, which greatly reduces the risk of the entire process being completely blocked. In this hybrid mode, the ratio of user threads to lightweight processes is N: M, as shown in Figure 12-5. This is the many-to-many threading model.

Figure 12-5 M: N relationship between user threads and lightweight processes

Many UNIX operating systems, such as Solaris and HP-UX, provide an implementation of the M: N threading model. Applications on these operating systems are also relatively easy to apply the M: N threading model.

4, Java thread implementation

How Java threads are implemented is not constrained by the Java Virtual Machine specification, which is a topic specific to a particular virtual machine. Java Threads were implemented in the early days of the Classic virtual machine (pre-JDK 1.2) based on a type of user thread called Green Threads, but as of JDK 1.3, The threading model of the “mainstream” commercial Java virtual machine on the “mainstream” platform is generally implemented based on the operating system native threading model, which is 1:1 threading model.

2. Java thread scheduling

Thread Scheduling refers to the process in which the system assigns processor rights to Threads. There are two main ways of Scheduling, namely Cooperative threads-scheduling and Preemptive threads-scheduling.

Collaborative thread scheduling

If the use of cooperative scheduling multi-threaded system, the execution time of the thread by the thread itself to control, the thread to finish their own work, to take the initiative to inform the system to switch to another thread. The biggest advantage of cooperative multithreading is simple implementation, and because the thread to finish their own things will be thread switch, switch operation is known to the thread itself, so there is generally no thread synchronization problem. The “coroutine” in Lua is one such implementation. The downside is obvious: thread execution time is out of control, and even if a thread is not written properly and never tells the system to switch, the program will always block.

Grab point thread scheduling

In a multithreaded system with preemptive scheduling, each thread is allocated execution time by the system, and thread switching is not determined by the thread itself. For example, in Java, the Thread::yield() method can yield execution time, but there is nothing the Thread can do to proactively obtain execution time. In this way, the execution time of the thread is controlled by the system, and there will be no problem that a thread will cause the whole process or even the whole system to block. The thread scheduling method used in Java is called preemptive scheduling.

Although Java thread scheduling is automated, you can still “suggest” that the operating system allocate more execution time to some threads and less to others — this is done by setting thread priorities. The Java language sets 10 levels of Thread priority (thread.min_priority through thread.max_priority). When two threads are in the Ready state at the same time, the thread with higher priority is more likely to be selected by the system for execution. However, thread priority is not a stable adjustment, and it is clear that thread scheduling is ultimately up to the operating system because Java threads on mainstream virtual machines are mapped to the system’s native threads. Although most modern operating systems provide the concept of thread priorities, they do not necessarily correspond to the priorities of Java threads. For example, while Solaris has 2147483648 priorities, Windows has only seven priorities. If the operating system has a higher priority than Java thread, the problem is easier to deal with with a little space in the middle, but for systems with a lower priority than Java thread, several thread priorities correspond to the same operating system priority. Table 12-1 shows the relationship between Java thread priorities and Windows thread priorities. Windows VMS use the other six thread priorities except THREAD_PRIORITY_IDLE. So setting thread priorities to 1 and 2, 3 and 4, 6 and 7, 8 and 9 under Windows has exactly the same effect.

Table 12-1 Mapping between Java thread priorities and Windows thread priorities

3. State transition

The Java language defines six thread states, and a thread can have only one of them at any given point in time, and can switch between them in a specific way. The six states are:

New: A thread that has not been started since it was created is in this state.
Runnable: Includes Running and Ready threads in the operating system state, where a thread may be executing or waiting for the operating system to allocate time for it to execute.
Waiting: threads in this state are not allocated processor execution time; they wait to be explicitly woken up by another thread. The following methods cause the thread to be stuck in an indefinite wait state:
- Object:: Wait () method with no Timeout parameter;
- Thread::join() without Timeout;
- LockSupport: : park () method.
Timed Waiting: Threads in this state are also not assigned processor execution time, but instead of Waiting to be explicitly awakened by another thread, they are automatically awakened by the system after a certain amount of time. The following methods cause the thread to enter the finite wait state:
- Thread: : sleep () method;
- Object:: Wait () with Timeout;
- Thread::join() with Timeout;
- LockSupport: : parkNanos () method;
- LockSupport: : parkUntil () method.
Blocked: A thread is Blocked. The difference between a Blocked state and a wait state is that a Blocked state is waiting to acquire an exclusive lock, an event that occurs when another thread abandons the lock. A “wait state” is waiting for a certain amount of time, or wakeup action, to occur. The thread enters this state while the program is waiting to enter the synchronization zone.
Terminated: The thread state of a Terminated thread. Terminated execution is Terminated.

The six states will be converted to each other when a specific event occurs, as shown in Figure 12-6.

Figure 12-6 Thread status conversion relationship

Java and coroutines

1. Limitations of kernel threads

The service demands on Web applications today, both in terms of number of requests and complexity, are different than they were more than a decade ago, partly because of the growth in business volumes and partly because of the constant refinement of services to cope with business complexity. Modern B/S a request of the external business system, often need to be distributed in a wide range of services on different machines, work together to implement this service segment architecture in reducing individual service complexity, as well as increase the reusability and inevitably increase the number of the service, also shortens the response time for each service. This requires that each service must complete the calculation in a very short time, so that the total time to compose multiple services is not too long; It also requires that each service provider be able to handle a larger number of requests at the same time, so that requests do not wait because a service is blocked. Java’s current concurrent programming mechanism is somewhat at odds with the above architectural trends. The 1:1 kernel thread model is the mainstream choice for Java virtual machine threads implementation today, but the natural disadvantages of this mapping to the operating system are high switching costs, scheduling costs, and limited number of threads the system can accommodate. Used to handle a request can be allowed to take a long time in a single application, with the cost of this thread is harmless, but now in the execution time of each request itself very short, quantity becomes the premise of many user thread switching cost is likely to approach is used to calculate the overhead of itself, even it will cause serious waste. Traditional Java Web servers typically have thread pools of dozens to two hundred, and when programmers flood the pool with millions of requests, the switching costs can be considerable, if not manageable. The need is forcing Java to work on new solutions, and everyone is nostalgic for the benefits of green threads, which have been relegated to history with the disappearance of the Classic VIRTUAL machine. Will it ever see the light of day again?

2. Recovery of coroutines

Why is kernel thread scheduling more expensive to switch? The cost of kernel thread scheduling mainly comes from the state transition between user state and kernel state, while the cost of these two state transitions mainly comes from the cost of responding to interrupt, protecting and resuming execution site. Imagine the following scenario where a thread switch occurs:

Thread A -> System interrupt -> Thread BCopy the code

When the processor wants to execute the program code of thread A, it is not only the code program that can run. The program is A combination of data and code, and the code execution must have the support of context data. The “context”, from the programmer’s point of view, is the various local variables and resources in the method call process. From a thread’s point of view, it is the information stored in the method’s call stack. From the point of view of the operating system and hardware, it is the specific values stored in memory, cache and registers. Various storage devices and the physical hardware registers are Shared by all threads within the operating system resources, when an interrupt occurs, the thread A switch to the thread B before I go to perform operating system thread A context data in the first place, the safey and registers, such as paging back to when the state of the thread B is hang up, This allows thread B to be reactivated as if it had never been suspended. This kind of protection and recovery site work, inevitably involves a series of data in a variety of registers, cache back and forth copy, of course, can not be a lightweight operation.

Most user threads were originally designed as Cooperative Scheduling, so they got the nickname “Coroutine.”

The main advantage of ** coroutines ** is that they are lightweight, much lighter than traditional kernel threads, both stacked and stackless. If quantified, HotSpot defaults to 1MB of ThreadStackSize on 64-bit Linux without explicitly setting -xss or -xx: ThreadStackSize, plus an additional 16KB of memory for Kernel Data Structures. In contrast, a stack of coroutines is usually several hundred bytes to several kilobytes, so 200 thread pools in a Java virtual machine is not too small, and many applications that support coroutines have hundreds of thousands of concurrent coroutines.

Coroutines, of course, have their limitations. There is a lot to implement at the application level (call stack, scheduler, etc.).

Specific to the Java language, there are other limitations, such as in virtual machines such as HotSpot, where the Java call stack is co-located with the local call stack. If you call a local method in a coroutine, can you switch coroutines without affecting the entire thread? Also, what happens when traditional thread synchronization measures are encountered in coroutines? The coroutine implementation provided by Kotlin, for example, encounters the synchronize keyword and still suspends the entire thread.

3. Java solutions

For stacked coroutines, there is a special implementation called Fiber, which was first introduced by Microsoft. Later, Microsoft also launched Fiber packs at the system level to facilitate applications to do on-site storage, recovery and Fiber scheduling. The OpenJDK created the Loom project in 2018, which is Java’s official solution to the scenarios listed at the beginning of this section, and based on current public information, the project should also use the name “fibers” in the new concurrent programming mechanism that it introduces to the Java language in the future, parallel to the current threading model. But this obviously has nothing to do with Microsoft. As you can see from the official Oracle explanation of “what is a fiber”, it is a typical stacked coroutine, as shown in Figure 12-11.

Figure 12-7 Oracle fiber presentation at JVMLS 2018

The intention behind the Loom project is to bring back support for user threads, but unlike the green threads of the past, these new features are not intended to replace the current operating system-based threading implementation. Instead, two concurrent programming models will coexist in the Java Virtual machine and can be used concurrently in programs. The new model intentionally maintains a similar API design to the current threading model, and they can even have a common base class so that existing code doesn’t need to change much to use fibers or even know which concurrent programming model is behind it. At JVMLS 2018, the Loom team presented the results of their test of Jetty’s fibrin-based transformation, comparing the traditional thread pool of 400 capacity with the new concurrent processing of one fiber per request, also under pressure of 5000QPS. The request response delay of the former is between 10000 and 20000 ms, while the delay of the latter is generally less than 200 ms, as shown in Figure 12-8.

Figure 12-8 Stress test of Jetty under the new concurrency model

Six, the summary

In this article, we understand the structure and operation of the VIRTUAL machine Java memory model, and explained the atomicity, visibility, order in the Java memory model, introduced the rules and use of the antecedent principle. In addition, we learned how threads are implemented in the Java language and how the new concurrency model that represents the future of multithreading in Java works.