The Java Memory Model (Java Memory Model) can solve the visibility and ordering problems, but how to solve the atomic problem?
public class Counter {
volatile int count;
public void add10K(a) {
for(int i = 0; i < 10000; i++) { count++; } } public int get(a) { return count; } public static void main(String[] args) throws InterruptedException { Counter counter = new Counter(); Thread t1 = new Thread(() -> counter.add10K()); Thread t2 = new Thread(() -> counter.add10K()); t1.start(); t2.start(); t1.join(); t2.join(); System.out.println(counter.count); } } Copy the code
The above code we expect the result is 20000, can run the result is really less than 20000. This is because the shared variable count is being modified by two threads at the same time, and the count++ statement actually corresponds to three instructions in the CPU
- Read the value of count
- count + 1
- Copy the value of count+1 back to count
A thread switch can occur at the end of any instruction, such as when count is 100, Thread 1 reads it and then increments it, but it hasn’t assigned the new value to count yet, so it cedes the CPU, and then thread 2 executes. Thread 2 also reads 100, and then it cedes the CPU after count++, at which point thread 1 is qualified to execute and adds count Set this parameter to 101.
In a similar situation, two or more threads read and write some shared data, and the final result depends on the precise timing of the threads’ execution, known as the race condition. So how do you avoid competitive conditions? The trick is to find a way to prevent multiple threads from reading and writing to the shared data at the same time.
In other words, what we need is mutual exclusion, that is, some way to ensure that when one process is using another shared variable or file, other processes cannot do the same.
The problem with the above example is that thread 2 uses the shared variable before thread 1 finishes using it.
The problem with threads is the same with processes. So let’s take a look at how the operating system handles this.
The part of a process that does internal calculations or other operations that do not raise race conditions. At some point a process may need to access shared memory or files, or perform other operations that could cause a race.
We refer to program segments that access shared memory as critical regions or critical regions. If we can keep two processes from being in critical sections at the same time, we can avoid competition.
To ensure that concurrent processes that use shared data can write correctly and efficiently, a good solution needs to meet the following four conditions.
- No two processes can be in their critical sections at the same time
- No assumptions should be made about the speed and number of cpus
- Processes running outside a critical section do not block other processes
- A process must not wait indefinitely to enter a critical zone
The expected process behavior is shown in the figure above. Process A enters the critical region at time T1. Later, at time T2, process B tries to enter the critical zone, but fails because another process is already in the critical zone, and only one process is allowed in the critical zone at a time. Therefore, B is temporarily suspended until A leaves the critical region at T3, thus allowing B to enter immediately. Finally, B leaves (at time T4), returning to the original state with no process in the critical region.
The mutex
To ensure that only one process can enter a critical region at a time, we need to keep processes mutually exclusive in the critical region, so that when one process updates shared memory in the critical region, other processes will not enter the critical region, so as not to cause unforeseen problems.
We’ll talk about a couple of options for mutexes.
Shielding interrupt
The simple thing to do in a uniprocessor system is to mask all interrupts immediately after entering a critical region and turn on interrupts before leaving. After an interrupt is masked, clock interrupts are also masked. The CPU will switch processes only when clock interrupts or other interrupts occur. In this way, the CPU will not be switched to other processes after an interrupt is masked, and there is no need to worry about other processes when checking and modifying shared memory.
An interrupt signal causes the CPU to stop what it is currently doing and start doing something else
But if you give the ability to mask interrupts to a process, what if a process does not turn interrupts on again after masking? The whole system could come to a halt. Multicore cpus are likely to be better. Only one of the cpus may be affected, but the other cpus will still work.
Lock variable
The lock variable is actually a software solution. Imagine a shared lock variable with an initial value of 0. When a process wants to enter a critical section, it makes a judgment similar to the following
if( lock == 0) {
lock = 1;
/ / critical region critical_region(); } else { wait lock == 0; } Copy the code
It looks fine, but the same problems occur with the code demonstrated at the beginning of this article. Process A reads the value of the lock variable lock and finds it to be 0, and just before it is set to 1, another process is scheduled to run with the lock variable set to 1. When the first process runs again, the lock is also set to 1, and two processes are in the critical section at the same time.
Similarly, even checking again before changing the value does not help, and a race condition can occur if the second process changes the value of the lock variable just after the first process completes the second check.
Strict rotation method
This method is designed so that two threads execute different methods
Thread 1:
while(true) {
while(turn ! =0) {
/ / short, etc
} / / critical region critical_region(); turn = 1; // Non-critical section noncritical_region(); } Copy the code
Thread 2:
while(true) {
while(turn ! =1) {
/ / short, etc
}
/ / critical region critical_region(); turn = 0; noncritical_region(); } Copy the code
Turn, with an initial value of 0, records which thread’s turn to enter the critical section.
At the beginning, process 1 checks turn and finds that it has a value of 0, so it enters the critical section. Process 2 also finds that the value is 0 and keeps testing the value of turn in a waiting loop.
Testing a variable continuously until a value appears is called busy waiting. Because this is a waste of CPU time, it should generally be avoided. Use busy waiting only when there is reason to believe that the wait time is very short. Locks used for busy waiting are called spin locks.
If it looks perfect, it’s not.
When process 2 is executing slowly and executing non-critical section code, turn = 1, and process 1 goes back to the beginning of the loop, but it can’t enter the critical section, it can only wait in the while loop.
This situation violates condition 3 stated earlier: a process cannot be blocked by a process outside a critical region. This also shows that it is not a good idea to take turns entering a critical section when one process is much slower than the other.
Peterson algorithm
The basic principle of mutual exclusion algorithm, invented in 1981, is as follows
int turn;
boolean[] interested = new boolean[2];
// The process number is 0 or 1
void enterRegion(int process) {
// Other processes int other = 1 - process; // Record which process is interested in entering the critical section interested[process] = true; // Set flags turn = process; // If the entry condition is not met, wait for nothing while(turn == process && interested[other] == true) { } } void leaveRegion(int process) { interested[process] = false; } Copy the code
The enterRegion method is called when a process wants to enter a critical region. This method makes the process wait as needed, and then calls leaveRegion to leave the critical region after completing operations on shared variables.
Although this algorithm does not strictly require the execution order of the process, it still causes busy wait.
Sleep and wake up
How to overcome the disadvantage of busy waiting?
The essence of busy waiting is that when a process wants to enter a critical area, it checks whether it is allowed to enter. If not, the process waits in place until it is allowed
Is there a command that blocks when a critical section cannot be reached, rather than waiting?
Process communication has two primitives,sleep and wakeup, that fulfill this requirement. Sleep is a system call that blocks the calling process and causes it to be suspended until another process wakes it up.
Wakeup takes one parameter, which is the thread to wakeup. Here let’s consider the example of a producer consumer.
Producers write data to the buffer, and consumers fetch data from the buffer. But there are two hidden conditions: the consumer cannot fetch data when the buffer is empty, and the producer cannot produce data when the buffer is full.
int MAX = 100;
int count = 0;
void producer(a) {
int item; while(true) { iterm = produce_item(); // When the queue is full, sleep if(count == MAX) { sleep(); } insert_item(item); count = count + 1; if(count == 1) { wakeup(consumer); } } } void consumer(a) { int item; while(true) { if(count == 0) { sleep(); } item = remove_item(); count = count -1; // If the queue is not satisfied, wake up producer if(count == N -1) { wakeup(producer); } consume_item(item); } } Copy the code
At first glance, there is a problem, which is that access to count(the current number of data in the queue) is unrestricted.
The queue is empty, and if a process switch occurs when the consumer is executing if(count == 0), the producer inserts a data and finds that count == 1 Consumer then executes the code and finds count = 0 and sleeps. Sooner or later, the producer will fill the queue, putting both processes to sleep.
A quick remedy is to add a wakeup wait bit. When a wakeup signal is sent to an awake process, the flag position is set to 1. Then, when the process is about to sleep, if the wakeup wait bit is set to 1, the flag bit is cleared while the process remains awake.
However, this method still treats the symptoms rather than the root cause. If there are three or more processes, more flag bits will be required.
A semaphore
A semaphore is a method proposed by E.W.Dijkstra in 1965 that uses an integer variable to accumulate the number of awakenments for later use. His proposal introduced a new variable type called semaphore, where a semaphore can be 0(indicating no wake up operation) or positive (indicating one or more wake up operations). Dijkstra suggested setting up two operations :down and up(generalized sleep and wakeup respectively).
void down(a) {
if(semaphore > 0) {
semaphore = semaphore -1;
} else {
sleep(); } } Copy the code
Checking values, changing variable values, and possible sleep operations are all performed as a single, indivisible atomic operation, ensuring that once a semaphore operation is started, no other process is allowed access to the semaphore until the operation is complete or blocked.
An atomic operation is a set of related operations that are either performed continuously or not at all.
void up(a) {
semaphore = semaphore + 1;
wakeup();
} Copy the code
The up operation increases the value of the semaphore by 1, and the same is true of the wake up operation.
So how do you ensure that the down() and up() operations are atomic? This is actually done through hardware supplied TSL or XCHG instructions.
The CPU executing the TSL or XCHG instruction locks the memory bus, preventing other cpus from accessing memory until the instruction ends.
Let’s see how we can use semaphores to solve the producer-consumer problem.
Three semaphores are used in the following implementation
- Full: Records the number of full buffer slots
- Empty: Records the number of empty buffer slots
- Mutex: Semaphore used by two or more processes, with an initial value of 1 to indicate that only one process can enter a critical section
#define N 100 // Number of slots in the buffer
typedef int semaphore; // A semaphore is a special type of integer data
semaphore mutex = 1; // Control access to critical sections semaphore empty = N; // Number of empty slots in the buffer semaphore full = 0; // The number of full slots in the buffer void producer(void) { int item; while (TRUE) { // Generate some data in the buffer item = produce_item(); // Reduce the number of empty slots by 1 down(&empty); // Enter the critical section down(&mutex); // Put the new data item into the buffer insert_item(item); // Leave the critical section up(&mutex); // Increase the number of full slots by 1 up(&full); } } void consumer(void) { int item; while (TRUE) { // Reduce the number of full slots by 1 down(&full); // Enter the critical section down(&mutex); // Fetch the data item from the buffer item = remove_item(); // Leave the critical section up(&mutex); // Increase the number of empty slots by 1 up(&empty); // Process data items consume_item(item); } } Copy the code
The empty semaphore guarantees that the producer will stop running when the buffer is full (empty=0), and the full semaphore guarantees that the consumer will stop running when the buffer is empty.
The mutex
Mutex is a simplified version of semaphore. Mutex is only suitable for managing shared resources or small pieces of code, and because they are easy and efficient to implement, they are very useful when implementing user-space thread packages.
A mutex is a variable that can be in one of two states: unlock and lock. Thus, only one binary bit is needed to represent it, although in practice, an integer quantity is often used, with 0 indicating unlock and all other values indicating lock.
When a thread (or process) needs to access a critical region, it calls mutex_lock. If the mutex is currently unlocked (i.e. the critical region is available), the call succeeds and the calling thread is free to enter the critical region.
On the other hand, if the mutex is locked, the calling thread is blocked until the thread in the critical section completes and calls mutex_UNLOCK. If multiple threads are blocked on the mutex, a thread is randomly selected and allowed to acquire the lock. Because mutexes are so simple, they can be easily implemented in user space if TSL or XCHG instructions are available.
mutex_lock:
// Copy the mutex to the register and set the mutex to 1 TSL REGISTER,MUTEX
// Is the mutex 0 CMP REGISTER,#O
// If the mutex is 0, it is unlocked, so return JZE ok // The mutex is busy, scheduling another thread CALL thread_yield // Try again later JMP mutex_lock Ok: RET // returns the caller and enters the critical section mutex_unlock: // set mutex to 0 MOVE MUTEX,# 0 // Return caller RET 1 Copy the code
The mutex_lock method is similar to the enter_region method, but with one key difference: Enter_region fails to enter the critical region and is busy waiting (in fact, due to the CPU clock timeout, other processes are scheduled to run).
Because thread_yield is just a call to the thread scheduler in user space, it runs very fast. Thus,mutex_lock and mutex_unlock do not require any kernel calls.
Mutexes in threads
In order to implement portable threaded programs, the IEEE standard 1003.1c defines a thread standard called Pthread, which is supported by most UNIX systems.
Pthreads provide a number of functions that you can use to synchronize threads. The basic mechanism is to protect each critical region with a mutex that can be locked and unlocked.
The main mutex related function calls are as follows:
Thread calls | describe |
---|---|
pthread_mutex_init | Create a mutex |
pthread_mutex destroy | Cancels an existing mutex |
pthread_mutex_lock | Get a lock or block |
pthread_mutex_trylock | Attempted to acquire a lock or failed |
pthread_mutex_unlock | Release a lock |
In addition to mutex, pThreads provide another synchronization mechanism: conditional variables.
** Mutexes are useful for allowing or blocking access to critical sections, while condition variables allow threads to block due to unmet conditions. In most cases, the two methods are used together. ** Now let’s take a closer look at the relationships between threads, mutex, and condition variables.
Consider also the producer-consumer problem: one thread puts products into a buffer, and another thread takes them out. If the producer finds that there are no empty slots in the buffer to use, it has to block until one is available. Producers can use mutex to perform atomicity checks without interference from other threads. But when the cache is found to be full, the producer needs a way to block itself and wake up later, and that’s what the condition variable does.
The pthread calls associated with the condition variable are as follows:
Thread calls | describe |
---|---|
pthread_condinit | Create a condition variable |
pthread_cond_destroy | Undo a condition variable |
pthread_cond_wait | Block to wait for a signal |
pthread_cond_signal | Signal another thread to wake it up |
pthread_cond_broadcast | Signal multiple threads to wake them all up |
The two most important operations related to condition variables are pthread_cond_wait, which blocks the calling thread until it is signaled by another thread (pthread_cond_signal notifies other processes), and pthread_cond_signal. The pthread_cond_broadcast call can be used when multiple threads are blocked on the same signal.
Condition variables are often used together with mutex, and this pattern is used to have a thread lock a mutex and then wait for a condition variable when it doesn’t get the result it expects. Finally, another thread signals it to resume execution. Pthread_cond_wait atomically calls and unlocks the mutex it holds, and for this reason the mutex is one of the arguments.
#include <stdio.h>
#include <pthread.h>
#define MAX 1000000000
pthread_mutex_t the_mutex;
pthread_cond_t condc, condp;
/ / the buffer int buffer = 0; void producer(void *ptr){ int i; for (i =1; i <= MAX; i++) { // Mutex uses buffers pthread_mutex_lck(&the_mutex); while(buffer ! =0) { pthread_cond_wait(&condp, &the_mutex); } // Put the data into the buffer buffer = i; // Wake up the consumer pthread_cond_signal(&condc); // Release the buffer pthread_mutex_unlock(&the_mutex); } pthread _exit(O); } void consumer(void *ptr) { int i; for (i = 1; i <= MAX; i++) { // Mutex uses buffers pthread_mutex_lock(&the_mutex); while (buffer == 0) { pthread_cond_wait(&condc, &the_mutex); } // Fetch data from the buffer buffer = 0; // Wake up the producer pthread_cond_signal(&condp); // Release the buffer pthread_.mutex_unlock(&the_mutex); } pthread_exit(O); } int main(int argc, char** argv) { / / initialization pthread_t pro, con; pthread_mutex_init(&the_mutex, 0); pthread_cond _init(&condc, 0); pthread_cond_init(&condp, 0); // Create a thread (pthread) pthread_create(&con, 0, consumer, 0); pthread_create(&pro, 0, producer, 0); // Wait for the thread to complete pthread_join(pro, 0); pthread_join(con, 0); pthread_cond_destroy(&condc); pthread_cond_destroy(&condp); pthread_mutex_destroy(&the_mutex); } Copy the code
This example is simple, but illustrates the basic mechanism. Statements that put a thread to sleep at the same time should always check this condition to ensure that the thread meets it before continuing, since the thread may have been woken up by a UNIX signal or other reason.
Tube side
With semaphores and mutex, interprocess communication looks easy, right?
If producer() is switched, the mutex value is reduced by 1 before empty. If the buffer is completely full, the producer will block, and the mutex value is 0.
The next time a consumer accesses the buffer, the consumer is blocked, a condition called a deadlock.
void producer(void) {
int item;
while (TRUE) {
// Generate some data in the buffer
item = produce_item(); // The following code switches the order // Enter the critical section down(&mutex); // Reduce the number of empty slots by 1 down(&empty); // Put the new data item into the buffer insert_item(item); // Leave the critical section up(&mutex); // Increase the number of full slots by 1 up(&full); } } Copy the code
This problem tells us to be careful with semaphores.
To make it easier to write correct programs,Brinch Hansen(1973) and Hoare(1974) proposed a high-level synchronization primitive called monitor. As we’ll see below, the two men offer slightly different solutions.
A program is a collection of procedures, variables, data structures, etc., which make up a special module or software package.
The following code is pascal-like syntax,procedure you can think of as a function.
monitor example:
interger i;
condition c;
procedure producer();
begin
. end; procedure consumer(); begin . end; end monitor; Copy the code
A very important feature of a pipe is that only one process can be active at any one time, which allows it to effectively complete the mutex.
A pipe is a language concept
Pipes are part of the programming language, and the compiler knows that they are special, so calls to pipes can be handled differently than other procedure calls.
When a process calls a pipe procedure, the first few instructions in that procedure check to see if there are other active processes in the pipe. If so, the calling process is suspended until another process leaves the pipe to wake it up. If no active process is using the pipe, the calling process can enter.
The mutex on entry into the pipe is the responsibility of the compiler, but it is common practice to use a mutex or semaphore. Because the compiler, not the programmer, arranges the mutex, there is much less chance of an error. At any given moment, the writer of the pipe doesn’t have to care about how the compiler implements mutex. He only needs to know how to convert all critical sections into a pipe procedure, and never two processes execute the code in a critical section at the same time.
The process still needs a way to block when it can no longer run, and the solution is the condition variable, along with the two related operations wait and signal.
When a monitor found he couldn’t run time (producers cannot buffer is full, for example), it can be carried in a condition variable wait operation, the operation result in the calling process’s own block, and will also be another outside the monitor to monitor the process of before, another process such as consumers can awaken the producers, carried out by producers are waiting for the condition variable Signal.
Signal Wake up policy
To avoid having two active processes in a pipe at the same time, we need a rule on what to do after signal.
Hoare recommends letting the new thread invoked execute while suspending the other thread.
Brinch Hansen suggests that the process executing signal must exit the pipe immediately, meaning that the signal statement should be the last statement in a pipe procedure.
The third way is to let the signaler continue running and allow the waiting process to start running only after the signaler exits the pipe. Java uses this approach.
Note that neither semaphores nor tubes work in distributed systems, and I’m sure you have an answer to why.