Processes and threads written for busy people

We talk about processes and threads more from a programming language perspective, so do you really know what threads and processes are? So let’s take a look at what processes and threads are from an operating system perspective.

process

The core concept of an operating system is the process, which is an abstraction of a running program. Everything else in the operating system revolves around processes. Processes are one of the oldest and most important concepts offered by operating systems. They support (pseudo) concurrent operations even when only one CPU can be used. They abstract a single CPU into multiple virtual machine cpus. It can be said that without process abstraction, modern operating systems would not exist.

All modern computers do a lot of things at the same time, a change that would have been completely incomprehensible to someone who used a computer in the past (with a single CPU). To illustrate this point, consider a Web server, where requests come from Web pages. When a request arrives, the server checks to see if the current page is in the cache, and if so, returns the cached contents directly. If not, the request is handed over to disk. However, from a CPU point of view, disk requests take longer because disk requests can be slow. When the hard disk request completes, more requests will come in. If you have more than one disk, you can consecutively make partial or full requests to other disks before the first request is completed. Obviously, this is a concurrency phenomenon, and concurrency control conditions are needed to control concurrency.

Now consider a PC with only one user. When the system starts, many processes are also started in the background, and the user usually does not know that these processes are started. Imagine that when your own computer starts, can you know which processes need to be started? These background processes may be an E-mail process that needs to enter E-mail messages, or a computer virus removal process that periodically updates the virus library. A user process may print files and burn CD-RoMs while all users are online, and these activities need to be managed. So a multi – process support multi – program system will appear very necessary.

In many multiprogramming systems, the CPU switches quickly between processes, allowing each program to run for tens or hundreds of milliseconds. However, strictly speaking, the CPU can only run one process at a time, whereas if we set the time to 1 second, it can run multiple processes. This gives us the illusion of parallelism. This is sometimes referred to as pseudoparallelism to distinguish a multiprocessor system (one in which two or more cpus share the same physical memory)

A little more detail on pseudo-parallelism: pseudo-parallelism is when a single or multi-core processor executes multiple processes at the same time, making a program faster. By switching cpus quickly between programs at very limited intervals, a sense of parallelism is created. The disadvantage is that CPU time may or may not be allocated to the next process.

Due to CPU execution speed fast, process change into the change between the also very quickly, so it is hard to track of multiple parallel process, so, after years of efforts, the designers of the operating system developed is used to describe a concept model of parallel process (sequence), makes parallel more easy to understand and analysis of the model, It is also the subject of this article. Let’s look at the process model

Process model

In the process model, software running on all computers, and often operating systems, is organized into sequential processes, called processes for short. A process is an instance of an executing program. A process also contains the current values of program counters, registers, and variables. Conceptually, each process has its own virtual CPU, but in practice the CPU switches back and forth between processes.

As shown in the figure above, this is a multi-processing program with four programs, and the program counter changes in different ways as the process changes.

In the figure above, the four programs are abstracted as four processes with their own control flow (that is, each program counter), and each program runs independently. Of course, there is really only one physical program counter, and when each program is run, its logical program counter is loaded into the physical program counter. When the program finishes running, its physical program counter is the real program counter and is then put back into the logical counter of the process.

As you can see from the figure below, after a long enough period of observation, all processes are running, but only one process is actually running at any given moment.

Thus, when we say that a CPU can only really run one process at a time, even if there are 2 cores (or cpus), each core can only run one thread at a time.

Since the CPU switches quickly back and forth between processes, the amount of time each process runs in the CPU is uncertain. And when the same process runs in the CPU again, its running time inside the CPU is often not fixed. The difference between a process and a program is subtle, but here’s an example to tell the difference: Think of a computer scientist who can cook and is making a birthday cake for his daughter. He had the recipe for a birthday cake, and the kitchen had all the forgiveness needed: flour, eggs, sugar, vanilla juice, etc. In this analogy, the recipe for the cake is the program, the computer scientist is the CPU, and the forgiveness for the cake is all input data. Progress is the sum total of a scientist’s actions, such as reading a recipe, fetching ingredients and baking a cake.

Now suppose the scientist’s son comes up to him and says he’s been stung on the head by a bee, and the scientist notes where he is in making the cake, pulls out his first aid manual, and follows the steps. Here, a switch between processes is involved, with the scientist (CPU) switching from making a cake (process) to performing medical assistance (another process). After the wound has been treated, the scientist will go back to the first step of recording the cake and continue making it.

The key idea here is to recognize the conditions required for a process, which is the sum of a particular kind of activity, with programs, inputs, outputs, and states. A single processor, which can be shared by several processes, uses some scheduling algorithm to decide when to stop working on one process and switch to serving another. It should also be noted that if a process runs twice, it is considered two processes. Now that we know about the process model, how are processes created?

Process creation

The operating system needs some way to create processes. Here are some ways to create a process

System initialization (init)
A running program performs a system call that creates the process (such as fork)
The user requested the creation of a new process
Initialize a batch job

System initialization

When you start an operating system, several processes are typically created. Some of these are numerous processes, which interact with users and do their work for them. Some run in the background and do not interact with a specific user, for example, a process designed to receive incoming E-mail that lies dormant most of the time but wakes up whenever the E-mail arrives. You can also design a process to receive incoming requests for web pages on that computer and wake up to handle incoming requests for web pages when the request arrives. Processes that run in the background to handle activities such as E-mail, Web pages, news, printing, and so on are called daemons. Large systems have many daemons. In UNIX, the PS program lists running processes, and in Windows, the task manager is available.

System call creation

In addition to creating processes during startup, some new processes can be created later. Typically, a running process issues a system call to create one or more new processes to help it do its job. For example, if there is a large amount of data that needs to be fetched across the network and processed sequentially, it is easier to create a process that reads the data and puts it in a shared buffer, while a second process picks it up and processes it correctly. In multiprocessors, having each process run on a different CPU can also make work faster.

User Request Creation

On many interactive systems, a program can be launched by typing a command or double-clicking an icon, either of which optionally starts a new process. Run X on a basic UNIX system, and the new process takes over the window that starts it. When a process is started in Windows, it generally has no Windows, but it can create one or more Windows. Each window can run a process. You can switch Windows and interact with processes with a mouse or command.

Interactive system is a computer system characterized by a large number of interactions between people and computers, such as games, Web browsers, IDE and other integrated development environments.

Batch creation

This last case of process creation applies to mainframe batch systems. Users submit batch jobs in such systems. When the operating system decides that it has the resources to run another task, it creates a new process and runs the next job from its input queue.

Technically, in all of these cases, having an existing process execute a process creates a new process by creating a system call. This process can be a running user process, a system process or batch program called from a keyboard or mouse. These are the system calls that create a new process. This system call tells the operating system to create a new process and, directly or indirectly, indicates which program to run in it.

In UNIX, there is only one system call to create a new process, and that system call is fork. This call creates a copy associated with the calling process. After fork, a parent and child will have the same memory image, the same environment string, and the same open file. Typically, the child process will execute execve or a simple system call to change the memory image and run a new program. For example, when a user prints sort in the shell, the shell forks a child process that then executes sort. The reason for this two-step process is to allow the child process to operate on its file descriptor after fork but before execve to complete standard input, standard output, and standard error redirection.

In Windows, on the other hand, a simple Win32 function called CreateProcess handles the process creation and loads the correct program into the new process. This call will have 10 parameters, including the need to perform program, input to the program’s command line parameters, a variety of security attributes, on whether the open file to inherit control bits, process, priority information needed to create the window of the specifications and a pointer to a structure, the structure of the newly created process information is returned to the caller. In addition to CreateProcess Win 32 there are about 100 other functions that handle process management, synchronization, and related transactions. Here is a comparison of system calls between UNIX and Windows operating systems

UNIX	Win32	instructions
fork	CreateProcess	Create a new process
waitpid	WaitForSingleObject	Wait for a process to exit
execve	none	CraeteProcess = fork + servvice
exit	ExitProcess	To terminate execution
open	CreateFile	Create a file or open an existing file
close	CloseHandle	Close the file
read	ReadFile	Reads data from a single file
write	WriteFile	Write data to a single file
lseek	SetFilePointer	Moving file pointer
stat	GetFileAttributesEx	Get different file properties
mkdir	CreateDirectory	Create a new directory
rmdir	RemoveDirectory	Remove an empty directory
link	none	Win32 does not support link
unlink	DeleteFile	Destroy an existing file
mount	none	Win32 does not support mount
umount	none	Win32 does not support mount, so it does not support mount
chdir	SetCurrentDirectory	Switch the current working directory
chmod	none	Win32 does not support security
kill	none	Win32 does not support signals
time	GetLocalTime	Get the current time

In UNIX and Windows, after a process is created, the parent and child processes have separate address Spaces. If one of the processes changes a word in its address space, the change will not be visible to the other process. In UNIX, the address space of the child process is a copy of the parent process, but two different address Spaces; Non-writable memory areas are shared. Some UNIX implementations are shared between the two precisely because it cannot be modified. Alternatively, the child process shares all of the parent’s memory, but in this case memory is shared by copy-on-write, which means that whenever one of the two processes wants to modify part of memory, the memory is explicitly copied first to ensure that the modification takes place in a private memory area. Again, writable memory cannot be shared. However, it is possible for a newly created process to share creator resources, such as open files. In Windows, the address space of the parent process and the address space of the child process are different from the beginning.

Termination of a process

After a process is created, it starts running and does its job. However, nothing lasts forever, including progress. A process terminates sooner or later, but it is usually triggered by the following

Normal exit (voluntary)
Wrong exit (voluntary)
Serious error (involuntary)
Killed by another process (involuntarily)

The normal exit

Most processes terminate when work is done. When the compiler has finished compiling the given program, the compiler makes a system call to tell the operating system that it is done. This call is exit on UNIX and ExitProcess on Windows. Screen-oriented software also supports voluntary termination. Word processors, Internet browsers, and similar programs always have an icon or menu item that the user can click to tell the process to delete any temporary files it locks open and then terminate.

Error to exit

The second reason for a process to terminate is when a serious error is found, for example, if a user executes the following command

cc foo.c	
Copy the code

To be able to compile foo.c but the file doesn’t exist, the compiler issues a declaration and exits. When error parameters is given, for interactive process usually does not directly out of the screen, because it is not reasonable from the view of the user, the user needs to know what happened and want to try again, so the application usually will pop up a dialog box to inform the user system error occurred, it is need to retry or quit.

Serious mistakes

The third cause of process termination is an error caused by the process, usually due to a bug in the program. For example, an invalid instruction is executed, references memory that does not exist, or divisor is 0, etc. In some systems, such as UNIX, a process can notify the operating system that it wants to handle certain types of errors on its own, in which case it receives a signal (an interrupt) rather than terminating the process when such errors occur.

Killed by another process

The fourth reason for terminating a process is when a process makes a system call telling the operating system to kill a process. In UNIX, the system call is kill. In Win32 the corresponding function is TerminateProcess (note not the system call).

The hierarchy of processes

In some systems, when a process creates other processes, the parent and child processes are related in some way. The child process itself creates more processes, forming a process hierarchy.

UNIX process system

In UNIX, a process and all its children and their children form a group of processes. When a user sends a signal from the keyboard, the signal is sent to all members of the group of processes currently associated with the keyboard (they are usually all active processes created in the current window). Each process can separately capture the signal, ignore it, or take the default action of being killed by the signal.

Here’s another example to illustrate the role of layers, considering how UNIX initializes itself at startup. A special process called init appears in the boot image. When the init process starts running, it reads a file that tells it how many terminals there are. A new process is then created for each terminal. These processes wait for the user to log in. If the login is successful, the login process executes a shell to wait to receive user input commands, which may start more processes, and so on. Therefore, all processes in the entire operating system belong to a single tree of processes rooted in init.

Windows Process System

In contrast, there is no concept of a process hierarchy in Windows. In Windows, all processes are equal. The only thing that resembles a hierarchy is that when a process is created, the parent process gets a special token (called a handle) that can be used to control the child process. However, this token may also be handed over to another operating system, so there is no hierarchy. In UNIX, a process cannot deprive its children of process rights. (Still, Windows sucks.)

Process status

Although each process is a separate entity with its own program counters and internal state, processes still need to help each other. For example, the results of one process can be input to another process in shell commands

cat chapter1 chapter2 chapter3 | grep tree
Copy the code

The first process, cat, cascades and outputs the three files. The second process is grep, which choose the tree contains a keyword from the input content, according to the relative speed of the two process (depending on the relative complexity of the two programs and their respective assigned to CPU time slice), below this kind of situation could occur, grep ready to run, but the input process is not yet complete, You must block the grep process until you finish typing.

When a process starts running, it may go through one of these states

There are three states involved in the diagram

Running state, the running state refers to the actual elapsed CPU time slice of the process
The ready state, ready is runnable but ready because other processes are running
Blocking stateThe process cannot run unless some external event occurs

Logically, the running state and the ready state are very similar. Both cases indicate that the process is running, but the second case does not get CPU time sharding. The reason the third state is different from the first two is that the process cannot run, nor can it run when the CPU is idle.

The three states involve switching between the four states. A state 1 rotation occurs when the operating system finds that a process cannot continue. In some systems, a process makes a system call, such as pause, to obtain a blocked state. On other systems, including UNIX, a process is terminated automatically when no input is available to read from a pipe or a special file (such as a terminal).

Both transformations 2 and 3 are caused by the process scheduler (part of the operating system), which the process itself is unaware of. The occurrence of transformation 2 indicates that the process scheduler has determined that the current process has been running long enough that it is time for another process to run the CPU time slice. When all other processes have run and it is time for the first process to regain the CPU time slice, the transition 3 occurs.

Scheduling refers to deciding which processes should be run first and for how long, which is important. Many algorithms have been designed to try to balance the overall efficiency of the system against the competing requirements of the various processes.

A transformation 4 occurs when an external event that the process is waiting for occurs, such as after some data is input from the outside. If no other process is running at this point, conversion 3 is triggered immediately and the process starts running, otherwise the process is in the ready phase, waiting for the CPU to be free before it is its turn to run.

From the above point of view, the following model is introduced

At the bottom of the operating system is the scheduler, which has many processes on it. All the details about interrupt handling, starting the process, and stopping the process are hidden in the scheduler. In fact, the scheduler is a very small program.

Process implementation

The operating system maintains a table, called the Process Table, to perform switching between processes. Each process occupies one process table entry. This table contains the process status of important information, including the program counter, stack pointer, memory allocation situation, open the file by the state, account and scheduling information, and the other in the process of transformation from running state to the ready state or blocked state information must be saved by, then can start again, to ensure that the process is like has never been interrupted.

The key fields in a typical system are shown below

The first column deals with process management, the second with storage management, and the third with file management.

See the following article for more information on text segment, data segment, and Stack segment for storage management

Core Knowledge of Assembly Language for Programmers (Full)

Now that we have a general understanding of the process table, we can explain more about the illusion of how multiple sequential processes run on a single CPU. Associated with each I/O class is a location called an interrupt vector (a fixed area near the bottom of memory). It contains the entry address of the interrupt service routine. Assuming that user process 3 is running when a disk interrupt occurs, the interrupt hardware pushes the program counter, the program status word, and sometimes one or more registers onto the stack, and the computer jumps to the address indicated by the interrupt vector. That’s what hardware does. Software then takes over the rest of the work.

When the interrupt ends, the operating system calls a C program to handle the rest of the interrupt. After the rest of the work is done, some processes are ready and the scheduler is called to decide which process to run next. Control is then transferred to a piece of assembly language code that loads register values and memory maps for the current process and starts the process running, as shown below for interrupt handling and scheduling.

Hardware push stack program counter etc
Hardware loads the new program counter from the interrupt vector
The assembly language procedure saves the value of a register
Assembly language procedure sets up the new stack
C Interrupts server running (typical reads and cache writes)
The scheduler decides which of the following programs to run first
C procedure returns to assembly code
The assembly language procedure starts running the new current process

A process may be interrupted thousands of times during execution, but the key is that after each interruption, the interrupted process returns to exactly the same state as before the interruption occurred.

thread

In traditional operating systems, each process has an address space and a thread of control. In fact, this is the definition of most processes. However, in many cases, it is common to have multiple threads of control running in the same address space, acting as separate processes. Let’s focus on what is a thread

Use of threads

Maybe this question is also your question, why on the basis of the process to create a thread concept, to be precise, this is actually a process model and thread model discussion, to answer this question, may need three steps to answer

The ability of multiple threads to share the same address space and all available data is something that processes don’t have
Threads are more important than processesA lightweightBecause threads are lighter, they are easier to create than processes and easier to undo. On many systems, creating a thread is 10-100 times faster than creating a process.
A third reason may be a performance argument. If multiple threads are CPU-intensive, there is no performance gain, but if there is a lot of computation and a lot of I/O processing, having multiple threads on top of each other in these activities can speed up the execution of the application

Multi-threaded solution

Now consider an example of thread usage: a World Wide Web server, where requests for pages are sent to the server and requested pages are sent back to the client. On most Web sites, some pages get more visits than others. For example, SONY’s home page has more access than any of the camera details pages, and the Web server can improve performance by keeping a collection of heavily visited pages in memory instead of pulling them in from disk. This collection of pages is called a cache, and caching is also used in many applications, such as CPU caches.

This is how a Web server is organized. A thread called a Dispatcher thread reads a work request from the network. After the dispatcher thread checks the request, it selects an idle (blocked) worker thread to process the request. Typically, a pointer to a message is written to a special word associated with each thread. The scheduling thread then wakes up the sleeping worker thread, changing its state from blocked to ready.

When the worker thread starts, it checks to see if the request exists in the Web page cache, which is accessible to all threads. If the web page does not exist in the cache, it calls a read operation to fetch the page from disk and blocks the thread until the disk operation is complete. When a thread is blocked during a hard disk operation, the scheduling thread may pick another thread to run in order to get more work done, or it may put another worker thread that is currently ready to run.

This model allows the server to be written as a collection of sequential threads, including an infinite loop in the program that dispatches the threads to get work requests and dispatch them to the worker threads. The code for each worker thread contains a request received from the scheduler thread, checks for the presence of the required page in the Web cache, and if so, returns the page directly to the customer. The worker thread then blocks, waiting for a new request to arrive. If not, the worker thread calls in the page from disk, returns the page to the client, and then blocks, waiting for a new request.

Here is the code for scheduling and worker threads, assuming TRUE is a constant of 1, and buF and Page are the corresponding structures for holding work requests and Web pages, respectively.

The rough logic of scheduling threads

while(TRUE){
  get_next_request(&buf);
  handoff_work(&buf);
}
Copy the code

The rough logic of the worker thread

while(TRUE){
  wait_for_work(&buf);
  look_for_page_in_cache(&buf,&page);
  if(page_not_in_cache(&page)){
    read_page_from_disk(&buf,&page);
  }
  return _page(&page);
}
Copy the code

Single-threaded solution

Now consider how to write a Web server without multithreading. It’s easy to think of it as a single thread, the main loop of a Web server that gets requests, checks them, and tries to complete the work before the next request. While waiting for disk action, the server idles and does not process any additional requests as they come in. The result is that very few requests are processed per second, so this example shows that multithreading improves parallelism and performance.

State machine solution

So far, we have had two solutions, the single-threaded solution and the multi-threaded solution. In fact, another solution is the state machine solution, which flows as follows

If there is currently only one non-blocking version of the READ system call available, when the request arrives at the server, the thread of the unique READ call checks, returns if it can get a response from the cache, and initiates a non-blocking disk operation if it can’t

The server records the status of the current request in the table, then goes in and gets the next event, which may be a request for new work or a disk response to a previous action. If it’s a request for a new job, start processing the request. If it is a response from a disk, the corresponding status information is extracted from the table and processed. For non-blocking disk I/O, this response is typically a signal interrupt response.

Each time the server switches from one request working state to another, the corresponding calculation state must be saved or reloaded. Here, each calculation has a saved state, and there is a set of events that will happen and change the relevant states. We call this kind of design finite-state machine, and finite-state machine cup is widely used in computer science.

Each of these three solutions has its own characteristics. Multithreading keeps the idea of sequential processes alive and achieves parallelism, but sequential processes block system calls; Single-threaded servers retain the simplicity of blocking systems at the expense of performance. The finite-state machine approach uses non-blocking calls and interrupts to achieve high performance through parallelism, but adds programming difficulties.

model	features
Single thread	No parallelism, poor performance, blocking system calls
multithreading	There’s parallelism, blocking system calls
Finite state machine	Parallelism, non-blocking system calls, interrupts

The classic threading model

Another way to understand processes is to somehow bring related resources together. Processes have address Spaces for program body and data, as well as other resources. These resources include open files, child processes, incoming timers, signal handlers, account information, and so on. Putting this information in a process makes it easier to manage.

Another concept is that a process has a thread of execution, often shortened to thread. The thread has a program counter to keep track of which instruction to execute next; Threads also have registers that hold variables that the thread is currently using. Threads also have a stack, which keeps track of the program’s execution path. Although threads must be executed in a process, processes and threads are completely different concepts, and they can be handled separately. Processes are used to hold resources together, while threads are scheduled execution entities on the CPU.

Threads add an element to the process model that allows greater independence from each other without interference within the same process. Running multiple threads in parallel in a process is similar to running multiple processes on a computer. In multiple threads, each thread shares the same address space and other resources. In multiple processes, processes share physical memory, disks, printers, and other resources. Threads are called Lightweight processes because they contain some process properties. The term multithreading is also used to describe multiple threads in the same process.

In the figure below we can see three traditional processes, each with its own address space and a single thread of control. Each thread runs in a different address space

In the figure below, we can see that there is a process with three threads. Each thread runs in the same address space.

Threads are not as independent as processes. All threads in the same process have exactly the same address space, which means they also share the same global variables. Because each thread has access to every memory address in the process address space, one thread can read, write, and even erase another thread’s stack. In addition to sharing the same memory space, threads have the following differences

The content on the left is shared by each thread in the same process, and the content on the right is shared by each thread. That is, the list on the left is a process property and the list on the right is a thread property.

Like processes, threads can be in one of the following states: running, blocked, ready, and terminated (not drawn in the process diagram). A running thread has a CPU time slice and is in a running state. A blocked thread waits for some event to release it. For example, when a thread performs a system call that reads data from the keyboard, the thread is blocked until there is input. A thread is usually blocked until it waits for some external event to occur or for another thread to release it. State transitions between threads are the same as between processes.

Each thread has its own stack, as shown in the figure below

Threaded system call

A process typically starts with one of the current single threads, which then creates a new thread by calling a library function (such as thread_create). Thread-created functions require the name of the newly created thread to be specified. The created thread usually returns a thread identifier, which is the name of the new thread.

When a thread has finished its work, it can exit by calling a function, such as thread_exit. Then the thread disappears, the state becomes terminated, and no more scheduling can be performed. During the execution of some threads, one thread can wait for another thread to exit by calling functions such as thread_join. This procedure blocks the calling thread until it waits for a particular thread to exit. In this case, the creation and termination of a thread is very similar to the creation and termination of a process.

Another common thread is the thread_yield call, which allows a thread to automatically abandon the CPU to allow another thread to run. Such a call is important because, unlike a process, a thread cannot use a clock interrupt to force it to give up CPU.

POSIX threads

To make it possible to write portable threading programs, IEEE defines the threading standard in the IEEE standard 1003.1c. The thread package is defined as Pthreads. Most UNIX systems support it. This standard defines more than 60 function calls. It’s not practical to list them all. Here are some common system calls.

POSIX threads (commonly referred to as PThreads) are a language-independent execution model, as well as a parallel execution model. It allows programs to control multiple different workflows that overlap in time. Each workflow is called a thread and can be created and controlled by calling the POSIX Threads API. You can think of it as a standard for threads.

POSIX Threads implementations are available on many similar POSIX-compliant operating systems, such as FreeBSD, NetBSD, OpenBSD, Linux, macOS, Android, Solaris, It implements PThread on top of existing Windows apis.

IEEE is the world’s largest technical professional organization dedicated to the development of technology for the benefit of mankind.

Thread calls	describe
pthread_create	Create a new thread
pthread_exit	Terminates the calling thread
pthread_join	Wait for a particular thread to exit
pthread_yield	Free the CPU to run another thread
pthread_attr_init	Creates and initializes a thread’s property structure
pthread_attr_destory	Deletes the attribute structure of a thread

All Pthreads have specific attributes, each containing an identifier, a set of registers (including program counters), and a set of attributes stored in the structure. This property includes stack size, scheduling parameters, and other items needed by the thread.

A new thread is created via pthread_CREATE, and the identifier of the newly created thread is returned as the function value. This call is very much like the UNIX fork system call (except for the parameters), where the thread identifier acts as a PID to distinguish it from other threads.

When a thread has finished its assigned work, it is terminated by pthread_exit. This call stops the thread and frees the stack.

Typically a thread needs to wait for another thread to finish its work and exit before continuing. You can wait for the termination of another specific thread through a pthread_join thread call. The thread identifier for the thread to wait on is given as an argument.

Sometimes it happens that a thread is not logically blocked, but feels like it has run long enough and wants to give another thread a chance to run. This can be done with pthread_yield.

The next two thread calls deal with properties. Pthread_attr_init establishes a property structure associated with a thread and initializes default values that can be changed by modifying the value of the property structure, such as priority.

Finally, pthread_attr_destroy removes a thread’s structure, freeing up its memory. It doesn’t affect the threads that call it, and those threads will always exist.

To better understand how pthreads work, consider the following example

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define NUMBER_OF_THREADS 10

void *print_hello_world(vvoid *tid){
  /* Prints the thread identifier, then exits */
  printf("Hello World. Greetings from thread %d\n",tid);
  pthread_exit(NULL);
}

int main(int argc,char *argv[]){
  /* The main program creates 10 threads and exits */
  pthread_t threads[NUMBER_OF_THREADS];
  int status,i;
 	
  for(int i = 0; i < NUMBER_OF_THREADS; i++){printf("Main here. Creating thread %d\n",i);
    status = pthread_create(&threads[i], NULL, print_hello_world, (void *)i);
    
    if(status ! =0) {printf("Oops. pthread_create returned error code %d\n",status);
      exit(- 1); }}exit(NULL);
}
Copy the code

After announcing its accusation, the main thread loops NUMBER_OF_THREADS several times, creating a new thread each time. If the thread fails to create, it prints a message and exits. After creating all the work, the main program exits.

Thread realize

There are three main implementations

Implementing threads in user space;
Implementing threads in kernel space;
Mix implementation threads in user and kernel space.

Let’s discuss it separately

Implement threads in user space

The first way is to put the entire thread package in user space. The kernel knows nothing about threads; it doesn’t know they exist. All such implementations have the same common structure

Threads run on the run-time system, which is a collection of procedures that manage threads, including the four previously mentioned procedures: pthread_CREATE, pthread_exit, pthread_JOIN, and pthread_yield.

A Runtime System, also known as a Runtime environment, provides the environment in which programs run. This environment may solve many problems, including the layout of the application’s memory, how the program accesses variables, the mechanism for passing parameters between procedures, the interface to the operating system, and so on. The compiler makes assumptions based on a particular runtime system to generate the correct code. Typically, the run-time system will be responsible for setting up and managing the stack, and will include such dynamic features as garbage collection, threading, or other features built into the language.

When managing threads in user space, each process needs to have its own thread table to track the threads in that process. These tables are similar to process tables in the kernel, except that they only record the properties of individual threads, such as program counters, stack Pointers, registers, and states for each thread. The thread label is managed by the runtime system. When a thread transitions to a ready or blocked state, all the information needed to restart the thread is stored in the thread table, just as the kernel stores in the process table.

The benefits of implementing threads in user space

Implementing threads in user space has these advantages over implementing threads in kernel space: Consider that if a process thread is switched if necessary when the thread completes or when pthread_yield is called, the thread information is stored in the thread table provided by the runtime environment, and the thread scheduler selects another thread to run. Saving thread state and schedulers are local procedures, so starting them is more efficient than making kernel calls. There is no need to switch to the kernel, there is no context switch, and there is no need to refresh the memory cache, because thread scheduling is easy and therefore efficient.

Another advantage of implementing threads in user space is that it allows each process to have its own custom scheduling algorithm. For example, in some applications, applications that have garbage collection threads (you know who that is) have the advantage of not having to worry about their threads stopping at the wrong time. User threads also have good scalability because kernel threads in kernel space require some table space and stack space, which can cause problems if the number of kernel threads is large.

Disadvantages of implementing threads in user space

While there are performance advantages to implementing threads in user space, there are significant disadvantages. How do you implement blocking system calls? Assuming that a thread reads the keyboard before any input from the keyboard, it is impossible for the thread to make a system call because that would stop all threads. So, one of the goals of using threads is to be able to get threads to block calls, and to prevent blocked threads from affecting other threads.

And blocking calls a similar problem, the problem is of a page fault. In fact, the computer will not put all programs are one-off in memory, if a program function call or jump instruction to the instruction of a not in memory, a page fault occurs, the operating system will be to disk to retrieve the lost order, this is called a missing page fault. When the required instructions are read in and executed, the process is blocked. If only one thread causes the page to fail, the kernel, since it doesn’t even know there is a thread, usually blocks the entire process until disk I/O is complete, even though other threads can run.

Another problem is that if a thread starts running, no other thread in that thread’s process can run unless the first thread voluntarily gives up the CPU. Within a single process, there are no clock interrupts, so it is not possible to schedule the thread in a round-robin manner. The scheduler has no chance of scheduling a thread unless other threads can enter the runtime environment of their own volition.

Implement threads in the kernel

Now let’s consider using the kernel to implement threads, where the runtime environment is no longer required. In addition, there is no thread table in each process. Instead, there are thread tables in the kernel that keep track of all the threads in the system. When a thread wants to create a new thread or destroy an existing thread, it makes a system call that updates the thread table to complete the creation or destruction of the thread.

The thread table in the kernel holds registers, state, and other information for each thread. This information is the same as thread information in user space, but is placed in the kernel instead of user space. In addition, the kernel maintains a process table to keep track of system state.

All calls that can block are implemented as system calls, and when a thread blocks, the kernel can choose between running another thread in the same process (if there are any ready threads) or running another thread in another process. But in a user implementation, the runtime system runs its own threads until the kernel deprives it of CPU time slices (or there are no runnable threads left).

Because it is expensive to create or destroy threads in the kernel, some systems recycle threads. When a thread is destroyed, it is marked as unrunnable, but its internal structure is not affected. Later, when a new thread must be created, the old thread is re-enabled and marked as available.

If a thread in a process causes a page failure, the kernel can easily check if there are other runnable threads and, if so, select one to run while waiting for the desired page to be read from disk. The downside of this is that the cost of system calls is high, so if the thread has a lot of operations (creation, termination), it can be very expensive.

Hybrid implementation

Combining the advantages of user-space and kernel-space, designers take a kernel-level thread approach and then multiplex user-level threads with some or all of the kernel threads

In this model, the programmer is free to control the number of user threads and kernel threads, with great flexibility. With this approach, the kernel identifies only kernel-level threads and schedules them. Some of these kernel-level threads are multiplexed by multiple user-level threads.

Interprocess communication

Processes need to communicate frequently with other processes. For example, in a shell pipe, the output of the first process must be passed to the second process, and so along the pipe. Therefore, if processes need to communicate with each other, they must use a good data structure that cannot be interrupted. We will discuss Inter Process Communication (IPC) together.

There are three problems with interprocess communication

The first problem, mentioned above, is how one process passes messages to other processes.
The second problem is how to ensure that two or more threads do not interfere with each other. For example, both airlines are trying to snap up the last seat on a plane for different customers.
The third problem is the order of the data, if process A produces the data and process B prints it. Process B can print data only after process A generates data.

It is important to note that the last two of these three questions also apply to threads

The first problem is easier to solve between threads because they share an address space, they have the same runtime environment, and you can imagine if you were writing multithreaded code in a high-level language, the thread communication problem was easier to solve?

The other two problems also apply to threads, and the same problem can be solved in the same way. We’ll talk about these three questions later, but you’ll just have an idea in your head.

A race condition

In some operating systems, collaborating processes may share common resources that can be read and written to each other. Common resources may be in memory or in a shared file. To illustrate how processes communicate, let’s take an example: a background printer. When a process needs to print a file, it places the file name in a special spooler directory. Another printer daemon checks periodically to see if a file needs to be printed, and if so, prints and deletes the file name from the directory.

Suppose our background directory has a very large number of slots, numbered 0,1,2… , each slot stores a file name. Also assume that there are two shared variables: out, which points to the next file to print; In, pointing to the next free slot in the directory. You can save these two files in a two-word file accessible to all processes. At one point, slots 0 to 3 were empty, and slots 4 to 6 were occupied. At the same time, process A and process B both decide to queue A file to print, as follows

Murphy’s Law says that anything that can go wrong will go wrong, and this is what can happen when it goes into effect.

Process A reads in with A value of 7 and stores 7 in A local variable next_free_slot. A clock interrupt occurs. The CPU determines that process A has been running for long enough and switches to process B. Process B also reads the value of in and finds that it is 7. Process B then writes 7 to its local variable next_free_slot, at which point both processes think that the next available slot is 7.

Process B will now continue running, it will write the print file name to slot 7, change the pointer to IN to 8, and then process B will leave to do something else

Now process A is running again. Since process A also found that slot 7 was empty by checking next_free_slot, process A saved the printed file name into slot 7 and updated the value of in to 8. Because slot 7 already contains values written by process B, the print file name of process A will overwrite the file name of process B. Because the printer cannot discover which process updated the file, process B will never print output. That is, when two or more threads modify a shared data at the same time, affecting the correctness of the program, this is called a race condition. Debugging race conditions is a very difficult task, because the vast majority of the time the program works fine, but in the rare cases where something strange happens that cannot be explained.

A critical region

Not only can race conditions be created by sharing resources, but they can also be created by sharing files and memory, so how can you avoid them? It is forbidden for one or more processes to read or write shared resources (including shared memory and shared files) at the same time. In other words, we need a mutual exclusion condition, which means that if one process uses shared variables and files in a certain way, other processes are prohibited from doing so (accessing unified resources). The crux of the above problem is that process B uses the shared variable before process A finishes using it. In any operating system, choosing the right primitive to implement mutex is a major design problem, which we’ll focus on next.

The conditions for avoiding competition problems can be described in an abstract way. Most of the time, processes are busy with internal calculations and other calculations that do not result in race conditions. However, sometimes a process accesses shared memory or files, or does something that can cause a race condition. We refer to program segments that access shared memory as critical regions or critical sections. If we can do it correctly so that it is impossible for two different processes to be in a critical region at the same time, we can avoid race conditions, which is also from an operating system design perspective.

Although the above design avoids race conditions, it does not ensure the correctness and efficiency of concurrent threads accessing shared data simultaneously. A good solution should include the following four conditions

At no time can two processes be in a critical region at the same time
No assumptions should be made about the speed and number of cpus
Processes outside a critical area must not block other processes
You cannot make any process wait indefinitely to enter a critical section

At time T1, process A enters the critical zone. At time T2, process B attempts to enter the critical zone. Since process A is in the critical zone, process B blocks until process A leaves the critical zone at time t3. Process B can then allow access to the critical section. Finally, at t4, process B leaves the critical region and the system returns to its original state without process.

Busy like a mutex

We’ll continue to explore the various designs that implement mutual exclusion, where while a process is busy updating its critical region of shared memory, no other process enters its critical region and is not affected.

Shielding interrupt

On a uniprocessor system, the simplest solution is to have each process mask all interrupts immediately after entering a critical section and re-enable them before leaving the critical section. After masking interrupts, clock interrupts are also masked. The CPU switches processes only when clock interruptions or other interruptions occur. This way, the CPU does not switch to another process after masking interrupts. So, once a process has shielded itself from interrupts, it can examine and modify shared memory without worrying about other processes accessing shared data.

Is this plan feasible? Who decides when a process enters a critical region? Not a user process? When a process enters a critical region, the user process shuts down and interrupts. If the process does not leave after a long period of time, then interrupts will not be enabled. What happens? It could bring down the entire system. And in the case of multiple processors, masking interrupts only apply to cpus that execute the Disable instruction. Other cpus will continue to run and have access to shared memory.

On the other hand, it is convenient for the kernel to break masking when it is executing several instructions to update a variable or list. For example, a race condition may occur if multiple processes interrupt while processing in the ready list. So while interrupt masking is a useful technique for the operating system itself, it is not a universal mutually exclusive mechanism for user threads.

Lock variable

As a second attempt, look for a software-level solution. Consider having a single shared (lock) variable, starting with a value of 0. When a thread wants to enter a critical area, it first looks to see if the value of the lock is 0. If the value of the lock is 0, the process sets it to 1 and lets the process enter the critical area. If the lock state is 1, the process waits until the value of the lock variable changes to 0. Therefore, a value of 0 for the lock variable means that no threads are entering the key area. A value of 1 means that processes are in the critical area. After we modify the above figure, it is shown as follows

Is this the right way to design? Is there a flaw? Suppose one process reads the value of the lock variable and finds it to be 0, and just before it sets it to 1, another process schedules to run, reads the lock variable to be 0, and sets the lock variable to be 1. The first thread then runs, setting the lock variable to 1 again, and the critical region will have two processes running simultaneously.

Perhaps some readers can take the view that checking once before entering and checking again in the key areas to leave will not solve the problem. In practice, this situation is not helpful because it is possible for other threads to change the value of the lock variable during the second check. In other words, set-before-check is not an atomic operation, so race conditions can also occur.

Strict polling method

In this case, the program is written in C because operating systems are generally written in C (and occasionally C++) rather than Java, Modula3, or Pascal. Java native keyword is also C or C++ written source code. While C is a powerful, efficient, predictable, and characteristic language for writing an operating system, Java is unpredictable because it runs out of memory at critical moments and calls garbage collection to reclaim memory at inappropriate times. This does not happen in C, where garbage collection is not actively invoked to reclaim memory. A comparison of C, C++, Java, and the other four languages can be found in the links

Code for process 0

while(TRUE){
  while(turn ! =0) {/* Enter key area */
    critical_region();
    turn = 1;
    /* Leave the critical area */noncritical_region(); }}Copy the code

The code for process 1

while(TRUE){
  while(turn ! =1){
    critical_region();
    turn = 0; noncritical_region(); }}Copy the code

In the code above, the variable turn, with an initial value of 0, is used to record which process’s turn to enter the critical section and to check or update shared memory. At the beginning, process 0 checks the TURN, finds that it has a value of 0, and enters the critical section. Process 1 also finds that it has a value of 0, so it tests turn repeatedly in a waiting loop to see when it changes to 1. The method of continuously examining a variable until a value appears is called busywaiting. This approach should generally be avoided because it wastes CPU time. Use busy wait only when there is reason to believe that the wait time is very short. Locks used for busy waiting are called spinlocks.

When process 0 leaves the critical region, it sets the value of TURN to 1 to allow process 1 to enter its critical region. If process 1 leaves the critical zone soon, then both processes are outside the critical zone and the turn value is set to 0. Now process 0 quickly completes the loop, exits the critical section and sets the value of TURN to 1. At this point, the value of TURN is 1, and both processes are executing outside their critical region.

Suddenly, process 0 finishes operating on the non-critical section and returns to the beginning of the loop. However, it cannot enter the critical section at this point because the current value of turn is 1 and process 1 is still busy with the non-critical section. Process 0 can only continue the while loop until process 1 changes the value of turn to 0. This shows that it is not a good idea to take turns entering a critical section when one process is executing much slower than the other.

This situation violates statement 3 above, which states that processes outside the critical region may not block other processes, and that process 0 is blocked by a process outside the critical region. As it violates article 3, it is not a good plan.

Peterson solution

Dutch mathematician T.Dekker first proposed a software mutual-exclusion algorithm that does not require strict rotation by combining lock variables with warning variables. For Dekker’s algorithm, see link

Later, G.L.Peterson discovered a much simpler mutual exclusion algorithm, which is as follows

#define FALSE 0
#define TRUE  1
/* Number of processes */
#define N     2													

/* Whose turn is it now */
int turn;					

/* All values are initialized to 0 (FALSE) */
int interested[N];											

/* The process is 0 or 1 */
void enter_region(int process){					
  
  /* Another process number */
  int other;														
  
  /* Another process */
  other = 1 - process;				
  
  /* Indicates a willingness to enter the critical section */
  interested[process] = TRUE;						
  turn = process;
  
  /* Empty loop */
  while(turn == process 
        && interested[other] == true){} 
  
}

void leave_region(int process){
  
  /* indicates leaving the critical section */
  interested[process] == FALSE;				 
}
Copy the code

Before using a shared variable (that is, before entering its critical region), each process calls enter_region with its own process number 0 or 1 as an argument. This function call will cause the process to wait until the critical region is safe as needed. After completing operations on shared variables, the process calls leave_region to indicate completion and allows other processes to enter.

Now let’s see how this works. At first, no process was in the critical region, and now process 0 calls enter_region. It indicates that it wants to enter the critical section by setting the array elements and setting TURN to 0. Because process 1 does not want to enter the critical region, enter_region returns quickly. If the process now calls enter_region, process 1 will hang there until interested[0] turns FALSE, which can only happen if process 0 calls leave_region to exit the critical region.

Now consider a case where two processes call enter_region at the same time. They both save their own processes to turn, but only the last saved process number is valid; the previous process number was lost due to overwriting. If process 1 is the last saved, turn is 1. While both processes are running, process 0 will not loop and enter the critical section, while process 1 will loop indefinitely and will not enter the critical section until process 0 exits the position.

TSL instruction

Now consider a solution that requires hardware help. Some computers, especially those designed for multiprocessors, have the following instruction

TSL RX,LOCK	
Copy the code

Called test and set lock, it reads a memory word LOCK into register RX and stores a non-zero value on that memory address. Read and write instructions are guaranteed to be one, indivisible, and executed together. No other processor is allowed to access memory until this instruction terminates. The CPU executing the TSL instruction locks the memory bus, preventing other cpus from accessing memory until the instruction ends.

It is important to note that locking the memory bus is not the same as disabling interrupts. Disabling interrupts does not guarantee that one processor will read or write to memory between reads or writes. That is, masking interrupts on processor 1 has no effect on processor 2. The best way to keep processor 2 out of memory until processor 1 finishes reading and writing is to lock the bus. This requires a special piece of hardware (basically, a bus ensures that the bus is used by the processor that locks it and not by other processors)

To use TSL instructions, a shared variable lock is used to coordinate access to shared memory. When lock is 0, any process can use the TSL directive to set it to 1 and read and write to the shared memory. When the operation ends, the process uses the move instruction to reset the lock value to 0.

How does this instruction prevent two processes from entering a critical section at the same time? Here’s the solution

Enter_region: | copy LOCK to REGISTER and LOCK set to 1 TSL REGISTER, LOCK LOCK is 0 |? CMP REGISTER, # 0 | if it is not zero, LOCK has been set, so the cycle developed enter_region | returned to the caller, enter the critical section RET leave_region: | into 0 MOVE in LOCK LOCK, # 0 | returned to the caller RETCopy the code

We can see that the idea of this solution is very similar to Peterson’s idea. Suppose the following 4-instruction assembly language program exists. The first instruction copies the original value of lock into the register and sets lock to 1, then compares the original value to 0. If it is not zero, it has been locked before, and the program returns to the start and tests again. After a period of time (long or short), the value changes to 0 (when the process currently in the critical section exits the critical section), and the process returns, locked. To clear the lock, the program simply stores 0 in the lock. No special synchronization instruction is required.

It is now clear that the process will call enter_region before entering the critical region to determine whether to loop, if lock is 1, loop indefinitely, and if lock is 0, not loop and enter the critical region. It calls leave_region when the process returns from a critical region, which sets lock to 0. As with all solutions to critical region based problems, the process must call enter_region and leave_region at the right time for the solution to work.

Another alternative to TSL is XCHG, which atomically swaps the contents of two locations, for example, a register and a memory word, as shown below

Enter_region: | 1 in the memory device for MOVE the REGISTER, # 1 | exchange REGISTER and the content of the LOCK variable XCHG REGISTER, LOCK LOCK is 0 |? CMP REGISTER, # 0 | if not zero, the LOCK has been set up, cycle developed enter_region | returned to the caller, enter the critical section RET leave_region: | into 0 MOVE in LOCK LOCK, # 0 | returned to the caller RETCopy the code

XCHG is essentially the same solution as TSL. All Intel x86 cpus use XCHG instructions for low-level synchronization.

Sleep and wake up

Peterson, TSL and XCHG in the above solutions are all correct, but they all have the disadvantage of busy waiting. The essence of these solutions is the same: first check to see if the critical region can be entered, and if not, the process will wait until it is allowed.

Not only does this approach waste CPU time, but it can lead to unexpected results. Consider that there are two processes on a computer with different priorities. H is the process with a higher priority and L is the process with a lower priority. The rule of process scheduling is to start running whenever process H is in a ready state. At some point L is in the critical region, at which point H becomes ready to run (for example, an I/O operation ends). H is now busy waiting, but since L is not scheduled when H is ready and L never has a chance to leave the critical area, H becomes an infinite loop, which is sometimes referred to as a Priority Inversion problem.

Now let’s look at interprocess communication primitives that block instead of wasting CPU time until they are not allowed to enter critical areas, the simplest being sleep and wakeup. Sleep is a system call that blocks the caller, that is, the system call is paused until another process wakes it up. The wakeup call takes one argument, which is the process to wakeup. Another way is that wakeup and sleep each take a single parameter, which is the memory address that sleep and wakeup need to match.

Producer-consumer problems

As an example of these private primitives, let’s consider the producer-consumer problem, also known as the bounded buffer problem. Both processes share a common fixed-size buffer. One is a producer who puts information into the buffer. The other is a consumer who takes it out of the buffer. It is also possible to generalize the problem to m producers and N consumers, but we will only discuss the case of one producer and one consumer to simplify the implementation.

If the buffer queue is full, there is a problem when producers still want to write data to the buffer. Its solution is to put the producer to sleep, that is, to block the producer. Wait until the consumer has fetched one or more data items from the buffer to wake it up. Similarly, when a consumer tries to fetch data from the buffer, but finds that the buffer is empty, the consumer also sleeps and blocks. Until the producer puts a new data into it.

This logic sounds simple, and it also requires a variable called a listener, which monitors the data in the buffer, tentatively called count. If the buffer holds at most N items, the producer will check whether count reaches N each time. Otherwise the producer puts a data item into the buffer and increments the value of count. The logic for the consumer is similar: first test if the value of count is 0, if it is 0 the consumer sleeps, blocks, otherwise it fetches data from the buffer and decays count. Each process also checks to see if other threads should be woken up, and if so, wakes that thread. Below is the code for the producer consumer

/* Number of buffer slots */
#define N 100						
/* The number of buffer data */
int count = 0										
  
/ / producer
void producer(void) {int item;
  
  /* infinite loop */
  while(TRUE){				
    /* Generate the next item */
    item = produce_item()				
    /* If the cache is full, it blocks */
    if(count == N){
      sleep();									
    }
    
    /* Puts the current data in the buffer */
    insert_item(item);
    /* Increase the number of buffers count */
    count = count + 1;					
    if(count == 1) {/* Is the buffer empty? * /wakeup(consumer); }}}/ / consumer
void consumer(void){
  
  int item;
  
  /* infinite loop */
  while(TRUE){
    /* If the buffer is empty, it blocks */
  	if(count == 0){							
      sleep();
    }
    /* Retrieves a data from the buffer */
   	item = remove_item();			
    /* Subtracts the buffer count by one */
    count = count - 1
    /* Is the buffer full? * /
    if(count == N - 1){					
      wakeup(producer);		
    }
    /* Prints the data item */consumer_item(item); }}Copy the code

To describe system calls like sleep and wakeup in C, we will represent them in the form of library function calls. They are not part of the C library, but can be used on any system that actually has these system calls. Insert_item and remove_item, which are not implemented in the code, are used to record things like putting items into the buffer and pulling data out of the buffer.

Now let’s go back to the producer-consumer problem, where the competition condition is created in the code above because the count variable is exposed to the public. It is possible that the buffer is empty and the consumer just reads the value of count and finds that it is 0. At this point the scheduler decides to suspend the consumer and start running the producer. The producer produces a piece of data, places it in the buffer, and then increments the value of count, noting that its value is 1. Since count is 0, the consumer must be asleep, so the producer calls wakeup to wake the consumer. However, the consumer is not logically asleep at this point, so the wakeup signal is lost. When the consumer next starts up, it looks at the count value it read earlier, finds that it has a value of 0, and goes to sleep there. After a while the producer will fill the entire buffer, after which it will block, so that both processes will sleep forever.

The nature of the above problem is that waking up a process that is not yet in a sleep state results in wakefulness loss. If it is not lost, everything is fine. A quick way to solve the above problem is to add a wakeup waiting bit. This position is 1 when a wakeup signal is sent to a process that is still awake. Later, when the process tries to sleep, if the wake wait bit is 1, the bit is cleared while the process remains awake.

However, when the number of processes is large, you can say that you want to wake up the wait bits by increasing the number of wakeup wait bits, so you have 2, 4, 6, 8 wakeup wait bits, but that doesn’t fundamentally solve the problem.

A semaphore

A semaphore is a method proposed by E.W.Dijkstra in 1965 that uses an orthopedic variable to accumulate the number of wakes for later use. In his view, there was a new type of variable called a semaphore. The value of a semaphore can be 0, or any positive number. Zero means no wakes are needed, and any positive number is the number of wakes.

Dijkstra proposed that semaphores have two operations, now usually using down and up (denoted by sleep and wakeup respectively). The down command checks to see if the value is greater than 0. If it is greater than 0, subtract 1; If this value is 0, the process will sleep and the down operation will continue. Checking values, changing variable values, and possible sleep actions are all performed as a single, indivisible atomic action. This ensures that once a semaphore operation is started, no other process can access the semaphore until the operation is complete or blocked. This atomicity is absolutely essential for solving synchronization problems and avoiding competition.

Atomic operation refers to the fact that in many other areas of computer science, a group of related operations are all performed without interruption or at all.

The up operation causes the semaphore value to + 1. If one or more processes are asleep on the semaphore and unable to complete a previous DOWN operation, the system selects one of them and allows the process to complete the down operation. Therefore, after an up operation is performed on the semaphore on which a process is sleeping, the value of the semaphore is still 0, but there is one less process sleeping on it. Increasing the semaphore value by 1 and waking up a process are also inseparable. No process will block for up, just as no process will block for wakeUp in the previous model.

Solve producer-consumer problems with semaphores

Solve the missing wakeup problem with the semaphore as follows

/* Define the number of buffer slots */
#define N 100
/* Semaphore is a special kind of int */
typedef int semaphore;
/* Control access to key areas */
semaphore mutex = 1;
/* Count the number of empty buffer slots */
semaphore empty = N;
/* Count the number of full buffers */
semaphore full = 0;												

void producer(void){ 
  
  int item;  
  
  /* TRUE's constant is 1 */
  while(TRUE){			
    /* Generates some data in the buffer */
    item = producer_item();		
    /* Reduce the number of empty slots by 1 */
    down(&empty);	
    /* Enter key area */
    down(&mutex);	
    /* Put the data into the buffer */
    insert_item(item);
    /* Leave the critical section */
    up(&mutex);	
    /* Will buffer the number of full slots + 1 */up(&full); }}void consumer(void){
  
  int item;
  
  /* infinite loop */
  while(TRUE){
    /* Number of full slots in the cache - 1 */
    down(&full);
    /* Enter the buffer */	
    down(&mutex);
    /* Fetch data from the buffer */
    item = remove_item();	
    /* Leave the critical section */
    up(&mutex);	
    /* The number of empty slots + 1 */
    up(&empty);	
    /* Process data */consume_item(item); }}Copy the code

To ensure that the semaphore works correctly, it is important to implement it in an indivisible way. Up and down are typically implemented as system calls. And the operating system only needs to temporarily block all interrupts when it checks for semaphores, updates, and puts the process to sleep if necessary. Because these operations require very few instructions, interruptions are not significant. If multiple cpus are used, the semaphore should be locked for protection. The TSL or XCHG instructions are used to ensure that only one CPU is operating on the semaphore at a time.

Using TSL or XCHG to prevent several cpus from accessing a semaphore at the same time is quite different from producers or consumers using busy wait to wait for others to vacate or fill buffers. The former takes only a few milliseconds, while the producer or consumer may take arbitrarily long.

The above solution uses three semaphores: one called full, which records the number of full buffer slots; One, called empty, records the number of empty buffer slots. One, called mutex, ensures that producers and consumers do not enter the buffer at the same time. Full is initialized to 0, empty is initialized to the number of slots in the buffer, and mutex is initialized to 1. The semaphore is initialized to 1 and used by two or more processes to ensure that only one of them can enter a critical region at a time is called a binary semaphores. Each process is guaranteed to be mutually exclusive if it executes down before entering a critical area and up after leaving a critical area.

Now we have a guarantee of a good interprocess primitive. And then let’s look at the sequence guarantee of interrupts

Hardware push stack program counter etc
Hardware loads the new program counter from the interrupt vector
The assembly language procedure saves the value of a register
Assembly language procedure sets up the new stack
C Interrupts server running (typical reads and cache writes)
The scheduler decides which of the following programs to run first
C procedure returns to assembly code
The assembly language procedure starts running the new current process

In systems that use semaphores, the natural way to hide interrupts is to equip each I/O device with a semaphore, which is initially set to 0. Immediately after the I/O device is started, the interrupt handler performs a down operation on the associated signal, and the process is blocked. When an interrupt enters, the interrupt handler then performs an up operation on the relevant semaphore, enabling the blocked process to resume running. In the interrupt handling steps above, step 5 C interrupt server running is an up operation performed by the interrupt handler on the semaphore, so in step 6, the operating system can execute the device driver. Of course, if several processes are already in a ready state, the scheduler may choose to run a more important process next, and we’ll discuss the scheduling algorithm later.

The above code actually uses semaphores in two different ways, and the distinction between the two is important. A mutex semaphore is used for mutual exclusion. It is used to ensure that only one process can read or write to the buffer and related variables at any one time. Mutual exclusion is a necessary operation to avoid process chaos.

Another semaphore is about synchronization. Full and Empty semaphores are used to ensure that an event occurs or does not occur. In this case, they ensure that the producer stops running when the buffer is full; The consumer stops running when the buffer is empty. These two semaphores are used differently from Mutex.

The mutex

When you don’t need the counting power of a semaphore, you can use a simple version of the semaphore called mutex. Mutexes have the advantage of keeping them in a shared resource and a piece of code. Because the implementation of mutex is simple and efficient, this makes mutex useful when implementing user-space threading packages.

A mutex is a shared variable that is in one of two states: unlocked or locked. Thus, only one binary bit is needed to represent it, although in general, it is usually represented by an integer. 0 indicates unlock, all other values indicate lock, and values greater than 1 indicate the number of locks.

Mutex uses two procedures. When a thread (or process) needs to access a critical area, it calls mutex_lock to lock it. If the mutex is currently unlocked (indicating that the critical region is available), the call succeeds and the calling thread is free to enter the critical region.

On the other hand, if the mutex is locked, the calling thread blocks until the thread in the key region completes and calls mutex_UNLOCK. If multiple threads block on a Mutex, a thread is randomly selected and allowed to acquire the lock.

Because mutex mutexes are so simple, they can be easily implemented in user space with TSL or XCHG instructions. The mutex_lock and mutex_UNLOCK codes for user-level threaded packages are as follows, as is the essence of XCHG.

Mutex_lock: | copies MUTEX to REGISTER, and the MUTEX to 1 TSL REGISTER, MUTEX | 0 MUTEX is? CMP REGISTER, # 0 | if the mutex is 0, it is unlocked, so return JZE ok | a mutex is used; Scheduling other threads CALL thread_yield | try again JMP mutex_lock | returned to the caller, enter the critical section ok: RET mutex_unlcok: | the mutex set to 0 MOVE mutex, # 0 | returned to the caller RETCopy the code

The mutex_lock code is similar to the enter_region code above

Do you see the biggest difference in the code above?

Based on our analysis of TSL above, we know that if TSL determines that a process is not in a critical region, it will do an infinite loop to acquire the lock, while in TSL processing, if MUtex is in use, another thread will be scheduled to process. So the biggest difference above is actually the processing after judging MUtex /TSL.
In the (user) thread, the situation is different because there is no clock to stop a thread that is running too long. The result is that the thread that tries to acquire the lock by busy waiting will loop forever and never get the lock, because the running thread will not let the other thread run to release the lock, and the other thread will never get the lock at all. When the latter fails to acquire the lock, it calls thread_yield to relinquish the CPU to another thread. As a result, there is no busy waiting. The next time the thread runs, it tests the lock again.

That’s the difference between enter_region and mutex_lock. Since thread_yield is simply a user-space thread schedule, it runs very quickly. Thus, neither mutex_lock nor mutex_unlock requires any kernel calls. By using these procedures, user threads can fully synchronize in user space, requiring only a small amount of synchronization.

The mutex we described above is actually a set of instructions in the invocation framework. From a software perspective, more features and synchronization primitives are always needed. For example, sometimes a thread package provides a call to mutex_trylock that attempts to acquire a lock or return an error code, but does not lock. This gives the calling thread the flexibility to decide what to do next, whether to use an alternative method or wait.

Futexes

As parallelism increases, effective synchronization and locking are important for performance. Spin locks are very effective if the process waits for a short time. However, if the wait time is long, CPU cycles are wasted. If there are many processes, it is more efficient to block the process and let the kernel unblock only when the lock is released. Unfortunately, this approach also leads to another problem: it can work well when processes are contested, but kernel switching costs can be very high when processes are not contested, and it’s even harder to predict the number of lock contention.

An interesting solution is to combine the best of both and come up with a new idea called FUtex, or Fast User Space Mutex. Sounds interesting, doesn’t it?

Futex is a feature in Linux that implements basic locking (much like mutexes) and avoids getting stuck in the kernel, which is very expensive to switch, which can greatly improve performance. Futex consists of two parts: the kernel service and the user library. The kernel service provides a wait queue that allows multiple processes to queue on a lock. They won’t run unless the kernel explicitly unblocks them.

Putting a process on a wait queue requires an expensive system call and should be avoided. Without competition, Futex can work directly in user space. These processes share a 32-bit integer (INTEGER) as a common lock variable. Assuming the lock is initialized to 1, we assume that the lock has been released. Threads preempt locks by performing atomic operations decrement and test. Decrement and Set are atomic features in Linux that consist of inline compilations wrapped in C functions and defined in header files. Next, the thread checks the result to see if the lock has been released. If the lock is not currently locked, then it just happens that our thread can successfully preempt the lock. However, if the lock is held by another thread, the thread preempting the lock has to wait. In this case, the Futex library does not spin, but uses a system call to put threads in a wait queue in the kernel. Thus, the overhead of switching to the kernel is justified, since threads can block at any time. When the thread is done with the lock, it uses atomic increment and test to release the lock and checks the result to see if any processes are still blocked on the kernel wait queue. If so, it informs the kernel that it can unblock one or more processes in the wait queue. If there is no lock contention, the kernel does not need to compete.

Mutex in Pthreads

Pthreads provides some functionality for synchronizing threads. The most basic mechanism is the use of mutex variables, which can be locked and unlocked to protect each critical region. A thread that wants to enter a critical region first tries to fetch mutex. If the mutex is not locked, the thread can enter immediately and the mutex can be locked automatically, preventing other threads from entering. If the mutex is locked, the calling thread blocks until the MUtex is unlocked. If multiple threads are waiting on the same mutex, when the mutex is unlocked, only one thread can enter and re-lock. These locks are not required and programmers need to use them properly.

Here are the mutex related function calls

As expected, mutex can be created and destroyed by Phread_mutex_init and Pthread_mutex_destroy. Mutex can also be locked by Pthread_mutex_lock, which blocks the caller if the mutex is already locked. There is also a call to Pthread_mutex_trylock to try to lock the thread, and when the mutex is already locked, it returns an error code rather than blocking the caller. This call allows the thread to be busy effectively. Finally, Pthread_mutex_unlock unlocks mutex and releases a waiting thread.

In addition to mutex, Pthreads provides a second synchronization mechanism: condition variables. Mutex does a good job of allowing or blocking access to critical areas. A condition variable allows a thread to block because some condition is not met. Most of the time these two methods are used together. Let’s take a closer look at the relationships between threads, mutex, and condition variables.

Let’s revisit the producer/consumer problem: one thread puts things in a buffer, and another thread takes them out. If the producer finds that there are no empty slots in the buffer to use, the producer thread blocks until another thread is available. Producers use Mutex to perform atomicity checks without interference from other threads. But when the buffer is found to be full, the producer needs a way to block itself and wake up later. That’s what the condition variable does.

Here are some of the most important pThread calls related to condition variables

The above table shows some calls to create and destroy condition variables. The main properties on condition variables are Pthread_cond_wait and Pthread_cond_signal. The former blocks the calling thread until another thread signals (using the latter call). A blocked thread usually needs to wait for a wakeup signal to release resources or perform some other activity. Only then can the blocked thread continue to work. Condition variables allow waiting and blocking atomicity of the process. Pthread_cond_broadcast is used to wake up multiple blocked threads waiting for a signal to wake up.

Note that condition variables (unlike semaphores) do not exist in memory. Note that if a semaphore is passed to a condition variable where no thread is waiting, the signal will be lost

Here is an example using mutex and condition variables

#include <stdio.h>
#include <pthread.h>

/* The number to produce */
#define MAX 1000000000										
pthread_mutex_t the_mutex;
/* Use semaphore */
pthread_cond_t condc,condp;								
int buffer = 0;

/* Production data */
void *producer(void *ptr){								
  
  int i;
  
  for(int i = 0; i <= MAX; i++){/* Buffer exclusive access, that is, use mutex to get the lock */
    pthread_mutex_lock(&the_mutex);				
    while(buffer ! =0){
      pthread_cond_wait(&condp,&the_mutex);
    }
    /* Put them in the buffer */
    buffer = i;			
    /* Wake up the consumer */
    pthread_cond_signal(&condc);	
    /* Release buffer */
    pthread_mutex_unlock(&the_mutex);			
  }
  pthread_exit(0);
  
}

/* Consumption data */
void *consumer(void *ptr){								
  
  int i;
  
  for(int i = 0; i <= MAX; i++){/* Buffer exclusive access, that is, use mutex to get the lock */
    pthread_mutex_lock(&the_mutex);				
    while(buffer == 0){
      pthread_cond_wait(&condc,&the_mutex);
    }
    /* Remove them from the buffer */
    buffer = 0;	
    /* Wake up the producer */
    pthread_cond_signal(&condp);
    /* Release buffer */
    pthread_mutex_unlock(&the_mutex);			
  }
  pthread_exit(0);
  
}							  
Copy the code

Tube side

In order to write more accurate programs, Brinch Hansen and Hoare propose a more advanced synchronization primitive called monitor. The two proposals are slightly different, as you can see from the descriptions below. A pipe is a collection of programs, variables, and data structures that form a special module or package. Processes can call programs in a pipe whenever they want, but they cannot access data structures and programs from outside the pipe. The following shows an abstract, concise procedure similar to that shown in Pascal. It cannot be described in C because a pipe is a language concept and C does not support pipe.

monitor example
	integer i;
	condition c;
	
	procedure producer(a);.end;	
	
	procedure consumer(a);
	.
	end;
end monitor;
Copy the code

One of the most important features of a pipe is that only one process can be active at any one time. This makes it very easy to implement mutex operations. Pipes are a feature of the programming language, so the compiler knows they are special and can handle calls to them differently than other procedure calls. Normally, when a process calls a program in a pipe, the first few instructions of that program check to see if there are other active processes in the pipe. If so, the calling process will be suspended until the other process leaves the pipe. The calling process can only enter if no active process is using the pipe.

The mutex that enters the pipe is the responsibility of the compiler, but a common practice is to use mutex and binary semaphore. Because the compiler, not the programmer, is doing the work, the chances of an error are much lower. In no case should the programmer who writes the pipe program care about how the compiler handles it. He just needs to know how to convert all critical sections into pipe procedures. There are never two processes executing code in a critical section at the same time.

Even though pipe routines provide an easy way to implement mutex, in our opinion, this is not enough. Because we also need a way to block when the process cannot execute. In a producer-consumer problem, it’s easy to put tests for full and empty buffers in the pipe program, but how does a producer block if it finds the buffer full?

The solution is to introduce condition variables and the related operations Wait and signal. When a piped program finds that it cannot run (for example, if the producer finds that the buffer is full), it performs a wait operation on a condition variable, such as full. This action causes the calling process to block and also calls in another process that was waiting outside the pipe. We discussed the implementation details of condition variables in the previous pThread article. Another process, such as a consumer, can wake up a blocked calling process by executing signal.

Brinch Hansen differs from Hoare in waking up a process. Hoare recommends letting a newly woken process continue running. Suspend another process. Brinch Hansen recommends that the process executing signal be forced out of the pipeline. We’ll use Brinch Hansen’s recommendation here because it’s conceptually simpler and easier to implement.

If several processes are waiting on a condition variable, the system scheduler can select only one of them to resume running after the condition is signaled.

By the way, there is a third option not mentioned by the professors above, which is to let the process that executes signal continue to run and wait for it to exit the pipe so that other processes can enter the pipe.

A condition variable is not a counter. Condition variables also do not accumulate signals for later use as semaphores do. So, if a signal is sent to a condition variable, but there is no waiting process on that condition variable, the signal will be lost. That is, wait must be executed before signal.

The following is a solution to a producer-consumer problem implemented through a pipe program using Pascal

monitor ProducerConsumer
		condition full,empty;
		integer count;
		
		procedure insert(item:integer);
		begin
				if count = N then wait(full);
				insert_item(item);
				count := count + 1;
				if count = 1 then signal(empty);
		end;
		
		function remove:integer;
		begin
				if count = 0 then wait(empty);
				remove = remove_item;
				count := count - 1;
				if count = N - 1 then signal(full);
		end;
		
		count := 0;
end monitor;

procedure producer;
begin
			while true do
      begin 
      			item = produce_item;
      			ProducerConsumer.insert(item);
      end
end;

procedure consumer;
begin 
			while true do
			begin
						item = ProducerConsumer.remove;
						consume_item(item);
			end
end;
Copy the code

The reader might think that wait and signal look like the aforementioned sleep and wakeup operations, and that the latter have serious competing conditions. They are similar, but there is a key difference: sleep and wakeup fail because when one process tries to sleep, the other process tries to wake it up. This does not happen with pipework. This is guaranteed by the automatic mutex of the pipework, and if the producer in the pipework finds that the buffer is full, it will be able to complete the wait without worrying that the scheduler might switch to the consumer before the wait is complete. Even more, consumers are not allowed to enter the pipeline until the wait is complete and the producer is flagged as unexecutable.

Although class Pascal is an imaginary language, there are some real programming languages that support it, such as Java (finally Big Java), which can support pipe procedures, which is an object-oriented language that supports user-level threading, and which allows methods to be divided into classes. Simply add the keyword synchronized to the method. Java can guarantee that once the method is executed by one thread, no other thread is allowed to execute any synchronized methods in the object. Without the keyword synchronized, no cross-execution can be guaranteed.

Below are the producer-consumer problems that Java solves using pipe procedures

public class ProducerConsumer {
  // Define the length of the buffer size
  static final int N = 100;
  // Initialize a new producer thread
  static Producer p = new Producer();
  // Initialize a new consumer thread
  static Consumer c = new Consumer();		
  // Initialize a pipe procedure
  static Our_monitor mon = new Our_monitor(); 
  
  // Run contains thread code
  static class Producer extends Thread{
    public void run(a){												
      int item;
      // Producer loop
      while(true){ item = produce_item(); mon.insert(item); }}// Production code
    private int produce_item(a){...}						
  }
  
  // Run contains thread code
  static class consumer extends Thread {
    public void run( ) {											
   		int item;
      while(true){ item = mon.remove(); consume_item(item); }}// Consume code
    private int produce_item(a){...}						
  }
  
  // This is the pipe process
  static class Our_monitor {									
    private int buffer[] = new int[N];
    // Counter and index
    private int count = 0,lo = 0,hi = 0;			
    
    private synchronized void insert(int val){
      if(count == N){
        // If the buffer is full, it goes to sleep
        go_to_sleep();												
      }
      // Inserts content into the buffer
			buffer[hi] = val;					
      // Until the next slot is found
      hi = (hi + 1) % N; 				
      // The number in the buffer increases by 1
      count = count + 1;											
      if(count == 1) {// If the consumer sleeps, wake upnotify(); }}private synchronized void remove(int val){
      int val;
      if(count == 0) {// The buffer is empty and goes to sleep
        go_to_sleep();												
      }
      // Fetch data from the buffer
      val = buffer[lo];				
      // Set the slot for the data item to be retrieved
      lo = (lo + 1) % N;					
      // The number of data items in the buffer is reduced by 1
      count = count - 1;											
      if(count = N - 1) {// If the producer sleeps, wake it up
        notify();															
      }
      return val;
    }
    
    private void go_to_sleep(a) {
      try{
        wait( );
      }catch(Interr uptedExceptionexc) {}; }}}Copy the code

There are four main classes designed in the above code. The outer class ProducerConsumer creates and starts two threads, P and C. The second and third classes, Producer and Consumer, contain the Producer and Consumer codes, respectively. Finally, Our_monitor is a tube procedure that has two synchronous threads for inserting and fetching data into and out of a shared buffer.

In all the previous examples, the producer and consumer threads are functionally identical to them. The producer has an infinite loop that generates data and puts it into a public buffer; The consumer also has an equivalent infinite loop, which is used to pull data out of the buffer and do a series of jobs.

One of the more interesting aspects of the program is Our_monitor, which contains buffers, administrative variables, and two synchronization methods. When a producer is active within insert, it ensures that the consumer cannot run in the remove method, thus ensuring the security of updated variables and buffers without worrying about race conditions. The variable count records the amount of data in the buffer. The variable LO is the serial number of the buffer slot, indicating the next data item to be fetched. Similarly, hi is the sequence number of the next data item to be placed in the buffer. Allow LO = hi, meaning there are 0 or N data in the buffer.

The synchronization approach in Java differs substantially from other classical pipe procedures: Java has no built-in condition variables. However, Java provides the equivalent of wait and notify to sleep and wakeup, respectively.

With the automatic mutual exclusion of critical sections, it is easier for a tube to ensure the correctness of parallel programming than a semaphore. But a pipe has its drawbacks. We mentioned earlier that a pipe is a programming language, and the compiler must recognize a pipe and guarantee its mutual exclusion in some way. C, Pascal, and most other programming languages have no pipe programs, so you can’t rely on the compiler to follow the mutex rule.

Another problem related to pipe and semaphore is that these mechanisms are designed to solve the problem of mutual exclusion on one or more cpus accessing shared memory. You can avoid contention by placing semaphores in shared memory and protecting them with TSL or XCHG instructions. But in distributed systems, where there may be multiple cpus, each with its own private memory, connected over a network, these primitives will be invalidated. Because semaphores are too low-level and pipe procedures cannot be used outside of a few programming languages, other methods are needed.

The messaging

The other method mentioned above is messaage passing. This approach to interprocess communication uses two primitives, Send and receive, which act like semaphores rather than tubes, and are system calls rather than language levels. The sample is as follows

send(destination, &message);

receive(source, &message);
Copy the code

The send method is used to send a message to a given target, and receive receives a message from a given source. If there is no message, the recipient may be blocked until it accepts a message or returns with an error code.

Design essentials of messaging system

Messaging systems now face many problems and design difficulties that are not covered by semaphores and pipes, especially those that communicate on different machines in the network. For example, messages may be lost by the network. To prevent message loss, senders and recipients can agree that the recipient will send a special acknowledgement message back as soon as the message is received. If the sender does not receive confirmation within a certain period of time, the message is resend.

Now consider the case where the message itself was received correctly, but the acknowledgement message returned to the sender was lost. The sender resends the message so that the recipient receives the same message twice.

It is important for the receiver to distinguish between a new message and an old message that has been resent. This problem is usually solved by embedding a sequential sequence number in each original message. If a recipient receives a message with the same sequence number as a previous message, it knows that the message is duplicate and can be ignored.

The messaging system must also deal with how to name processes so that they are clearly identified in send or receive calls. Authentication is also a problem, such as how the client knows it is communicating with a real file server, and information from sender to receiver can be tampered with by middlemen.

Solve producer-consumer problems with messaging

Now let’s consider how messaging can be used to solve the producer-consumer problem rather than shared caching. Here’s one solution

/* The number of slots in the buffer */
#define N 100													

void producer(void){
  
  int item;
  /* The number of slots in the buffer */
  message m;													
  
  while(TRUE){
    /* Generates data to put into the buffer */
    item = produce_item();						
    /* Wait for the consumer to send an empty buffer */
    receive(consumer,&m);							
    /* Create a message to be sent */
    build_message(&m,item);						
    /* Send to the consumer */send(consumer,&m); }}void consumer(void){
  
  int item,i;
  message m;
  
  /* Loop N times */
  for(int i = 0; i < N; i++){/* Send N buffers */
    send(producer,&m);								
  }
  while(TRUE){
    /* Accept a message containing data */
    receive(producer,&m);							
    /* Extract data from the message */
  	item = extract_item(&m);					
    /* Sends the empty buffer back to the producer */
    send(producer,&m);								
    /* Process data */consume_item(item); }}Copy the code

It is assumed that all messages are of the same size and are automatically buffered by the operating system when outgoing messages have not been received. There are N messages used in this solution, which is similar to N slots in a shared memory buffer. The consumer first sends N empty messages to the producer. When a producer passes a data item to a consumer, it takes an empty message and returns a message populated with content. In this way, the total number of messages in the system remains the same, so messages can be stored in a predetermined amount of memory.

If the producer is faster than the consumer, all messages will eventually fill up, waiting for the consumer, and the producer will be blocked, waiting for an empty message to return. If the consumer is fast, the opposite is true: all messages are empty, waiting to be filled by the producer, and the consumer is blocked waiting for a filled message.

There are many variations on how messages are delivered, but here’s how messages are addressed.

One way is to assign a unique address to each process and have messages addressed at the process’s address.
Another approach is to introduce a new data structure calledEmail address (mailbox)Mailbox is a data structure used to buffer certain data. There are many ways to set messages in the mailbox. The typical method is to determine the number of messages when the mailbox is created. When using mailboxes, the address arguments in send and receive calls are the address of the mailbox, not the address of the process. When a process tries to send a message to a full mailbox, it will hang until a message is picked up from the mailbox, freeing up address space for new messages.

barrier

The last synchronization mechanism is intended for producer-consumer situations in groups of processes rather than between processes. In some applications, phases are divided and no process can proceed to the next phase unless all processes are ready to proceed to the next phase, which can be achieved by installing a barrier at the end of each phase. When a process reaches the barrier, it is blocked by the barrier until all the barriers are reached. A barrier can be used to synchronize a group of processes, as shown in the figure below

In the figure above, we can see that there are four processes approaching the barrier, which means that each process is performing operations but has not yet reached the end of each phase. After some time, processes A, B, and D all reach the barrier and their respective processes are suspended, but they cannot proceed to the next phase because process B has not finished executing. As a result, when the last C reaches the barrier, the process group can proceed to the next phase.

Avoid locking: read-copy-update

The fastest lock is no lock at all. The question is whether we allow access to concurrent reads and writes of shared data structures without locking. The answer, of course, is no. Suppose process A is sorting an array of numbers and process B is averaging it, and if you move A, B will read duplicate values many times, some of which are never encountered at all.

However, in some cases, we can allow write operations to update data structures, even if other processes are using them. The trick is to make sure that each read reads either the old version or the new version, such as the tree below

In the tree above, the read operation traverses the tree from root to leaf. To do this, we add a new node X and make it “exactly right” before it becomes visible in the tree: we initialize all the values in node X, including its child Pointers. X is then called A child of A by atomic write operations. None of the read operations will read inconsistent versions

In the figure above, we then remove B and D. First, the pointer to the left child of A points to C. All reads originally in A will be read to node C and never to node B or D. That is, they will only read the new version of the data. Similarly, all current reads in B and D continue to follow the original data structure pointer and read the old version of the data. Everything works correctly and we don’t need to lock anything. The main reason for removing B and D without locking data is ready-copy-update (RCU), which separates removal from redistribution in the Update process.

scheduling

When a computer is a multi-programming system, there are frequently many processes or threads competing for CPU slices at the same time. This happens when two or more processes/threads are in a ready state. If only one CPU is available, then you must choose which process/thread can run next. There is a role in the operating system called a scheduler that does just that, using an algorithm called the scheduling algorithm.

Although there are some differences, many of the same processes that apply to process scheduling also apply to thread scheduling. When the kernel manages threads, scheduling usually occurs at the thread level, with little or no regard for which process the thread belongs to. We will first focus on process and thread scheduling and then explicitly introduce thread scheduling and the problems it causes.

Scheduling is introduced

Back in the early days of batch processing systems with cards on tape as input, the scheduling algorithm was simple: run each job on tape in turn. For multi-programming systems, it is more complicated because there are often multiple users waiting for service. Some mainframes still use a combination of batch and time-sharing services, requiring the scheduler to decide whether to run a batch job or a user on the terminal next. Since CPU is a scarce resource on these machines, a good scheduler can achieve great results in terms of improved performance and user satisfaction.

Process behavior

Almost all process (disk or network) I/O requests and calculations run alternately

As shown in the figure above, the CPU runs for some time without stopping and then issues a system call waiting for I/O to read or write files. Once the system call is complete, the CPU computes again until it needs to read or write more data. An I/O activity is when a process is blocked waiting for an external device to complete its work.

A above is cpu-intensive process; B is an I/ O-intensive process. A is compute-bound or CPU-bound because it takes longer to compute. B is I/O-bound because I/ OS occur more frequently. Computationally intensive processes have longer CPU concentration and less frequent I/O waits. I/O intensive processes have shorter CPU usage times and more frequent I/O waits. Note that the difference between the two processes is more about CPU time than I/O time. I/O intensive because they don’t spend more computation between I/ OS, not because I/O request times are particularly long. No matter how long it takes the data to arrive, they all take the same amount of time to make a hardware request to read the disk block.

It is worth noting that as CPUS get faster and faster, more processes tend to be I/O intensive. This happens because CPU speeds increase much faster than hard drives. As a result, scheduling for I/ O-intensive processes seems to be more important in the future. The basic idea here is that if you need to run an I/ O-intensive process, you should give it a chance as soon as possible to make disk requests and keep the disk busy.

When scheduling

The first scheduling problem is when to make scheduling decisions. There are various situations that require scheduling. First, after creating a new process, you need to decide whether to run the parent or child process. Since both processes are in a ready state, this is a normal scheduling decision and can be selected arbitrarily, that is, the scheduler can choose any child or parent process to start running.

Second, scheduling decisions need to be made when a process exits. Because this process is no longer running (because it will no longer exist), you must select another process from the ready process to run. If no process is in a ready state, the free processes provided by the system will usually run

What is an idle process

System-supplied Idle process (SYSTEM-supplied idle process) is a system process in Microsoft Windows operating system. It is a single thread running on each processor. Its sole task is to occupy processor time when the system is not processing other threads. System Idle Process is not a real Process, it is the core virtual, multitasking operating System exists. When no Process is available, the System Idle Process is running. You can simply understand that it represents the idle state of the CPU. The higher the value is, the more idle the processor is. You can check the CPU utilization in Windows through the Task Manager

The third case is when a process is blocked in I/O, semaphore, or other reasons and another process must be selected to run. Sometimes, the cause of a block becomes a key factor in choosing a process to run. For example, if A is an important process and it is waiting for B to exit the critical area, let B exit the critical area so that A can run. But schedulers generally don’t take this into account.

Fourth, scheduling decisions can be made when I/O interrupts occur. If the interruption came from an I/O device, and the I/O device has finished its work, those processes waiting for the I/O can now continue running. It is up to the scheduler to decide whether it is ready to run a new process or rerun an interrupted one.

If the hardware clock provides periodic interrupts at 50 or 60 Hz or some other frequency, scheduling decisions can be made at each clock interrupt or the KTH clock interrupt. Scheduling algorithms can be divided into two types according to how to handle clock interrupts. Non-preemptive scheduling algorithms pick a process and let it run until it becomes blocked (blocking on I/O or waiting for another process) or until it automatically releases the CPU. Even after the process has been running for several hours, it is not forced to suspend. This will not be scheduled when a clock interrupt occurs. After processing a clock interrupt, if no higher priority process is waiting, the interrupted process continues.

Another case is preemptive scheduling algorithms, which select a process and run it for a maximum fixed time. If it is still running after the end of the interval, the process is suspended and the scheduler selects another process to run (provided there is a ready process). Preemptive scheduling requires a clock interrupt at the end of the interval to return control of the CPU to the scheduler. If no clock is available, then non-preemptive is the only option.

Classification of scheduling algorithms

There is no doubt that different scheduling algorithms are needed in different environments. This happens because different applications and different operating systems have different goals. In other words, scheduler optimization is different in different systems. It is necessary to distinguish three environments

Batch (Batch)
Interactive (Interactive)
Real time (Real time)

Batch processing systems are widely used in business, such as processing payroll, inventory, accounting receipts, accounting expenses, interest calculations, claims processing, and other cyclical operations. In batch systems, non-preemptive algorithms or preemptive algorithms with long periodicity are generally used. This approach reduces thread switching and therefore improves performance.

In an interactive user environment, preemptive algorithms are needed to prevent one process from occupying the CPU and refusing to serve other processes. Even if no process is intended to run forever, it is possible to exclude all other processes indefinitely because one process is faulty. To avoid this, preemption is also necessary. Servers also fall into this category because they typically serve multiple (remote) users who are in a hurry. Computer users are always busy.

In real-time systems, preemption is sometimes unnecessary because processes, knowing they may not run for long, usually do their work quickly and block. The difference between real-time systems and interactive systems is that real-time systems only run programs that advance existing applications, while interactive systems are universal and can run arbitrary non-cooperative or even malicious programs.

The target of the scheduling algorithm

In order to design scheduling algorithms, it is necessary to consider what a good scheduling algorithm is. There are some goals that depend on the environment (batch, interactive, or real-time). Most are applicable to all situations, but there are some factors to consider that we will discuss together below.

All system

In all cases, fairness is important. It is unfair to give one process a larger slice of CPU time than an equivalent process. Of course, different types of processes can take different approaches.

The thing about fairness is enforcement of the system. What does that mean? If the payroll system of a company is scheduled to be distributed on the 15th of this month, and everyone is in a pinch due to the epidemic, the boss says that the payroll will be distributed on the 14th night, then the scheduler must force the process to implement the strategy of the payroll on the 14th night.

Another common goal is to keep all parts of the system as busy as possible. If the CPU and all the I/O devices are running all the time, more work can be done per second than if something is idling. For example, in a batch system, the scheduler controls which jobs are called into memory to run. It is a good idea to have some CPU intensive processes and some I/O intensive processes in memory, rather than calling in and running all THE CPU intensive jobs first, and then calling in and running all the I/O intensive jobs after they are done. With the latter approach, CPU-intensive processes start competing for CPU while the disk runs idle, and I/O intensive processes start competing for disk while the CPU runs idle… Obviously, by combining I/O intensive and CPU intensive, you can make the whole system run more smoothly and efficiently.

Batch system

There are generally three metrics to measure system health: throughput, turnaround time, and CPU utilization. Throughput (THROUGHOUT) is the number of jobs completed by the system per hour. All things considered, 50 tasks per hour is better than 40. Turnaround time is the average time from a batch submission to the time when a job is due. This data measures the average wait time for the user to get the output. The smaller the turnaround time, the better.

CPU utilization is usually used as an indicator on batch systems. Even so, CPU utilization is not a good metric, and the really valuable metric is how many jobs the system can do per hour (throughput) and how long it takes to do them (turnaround time). Using CPU utilization as a metric is like comparing the performance of a car to how many times an engine spins per hour. And knowing when CPU utilization is approaching 100% is more useful than when you need more computing power.

Interactive system

For interactive systems, there are different metrics. The most important thing is to minimize response time. This time refers to the time between the execution of the instruction and the result. On a PC where background processes are running (for example, reading and saving E-mail files from the network), user requests to start a program or open a file should take precedence over background work. What makes all interactive requests run in the first place is a good service.

A related problem is proportionality, which means that users have a fixed (though often incorrect) idea of how long it takes to do something. When a request is considered complex and takes a lot of time, it is considered normal and acceptable, but a simple application takes a long time to run, and users are annoyed. Take color printing and photocopying as a simple example. Color printing may take 1 minute, but users find it complicated and are willing to wait a minute. On the contrary, photocopying is easy and only takes 5 seconds.

Real-time systems

The real-time system has different considerations from the interactive system, so it has different scheduling objectives. Real-time systems are characterized by the need to meet deadlines. For example, if a computer controls a device that generates data at a fixed rate, it may lose data if it is not up and running on time. Therefore, the most important requirement in a real-time system is to meet all (or most) time horizons.

In some practical systems, especially those involving multimedia, predictability is important. Occasionally missing deadlines doesn’t matter, but if audio multimedia runs erratically, the sound quality will continue to deteriorate. Videos can also cause problems, but ears are much more sensitive than eyes. To avoid these problems, process scheduling must be highly predictable and regular.

Scheduling in batch processing

Now let’s switch our focus from general scheduling to specific scheduling algorithms. Next we will look at scheduling in batch processing.

First come, first served

It’s like a first come, first served… Perhaps the simplest design for a non-preemptive scheduling algorithm is first-come,first-serverd. Using this algorithm, cpus are allocated to processes in the order they are requested. At its most basic, there is a queue of ready processes. When the first task enters the system from outside, it starts immediately and allows it to run for any length of time. It will not be interrupted by running too long. When other jobs come in, they go to the end of the ready queue. When the running process blocks, the first process in the wait queue starts running. When a blocked process is back in the ready state, it will appear as a newly arrived task at the end of the queue, at the end of all processes.

The power of this algorithm is that it is easy to understand and program. In this algorithm, a single linked list keeps track of all ready processes. To select a process to run, simply remove a process from the head of the queue. To add a new job or block a process, simply append the job or process to the end of the queue. This is a very simple implementation.

If you have 100 I/O processes queuing and the 101st one is cpu-intensive, then you have to wait for 100 I/O processes to finish running before one cpu-intensive process runs. This is not possible in practice, so priority or preemptive processes are needed to prioritize important processes to run.

Shortest job first

In batch processing, the second scheduling algorithm is Shortest Job First, and we assume that the running time is known. An insurance company, for example, can predict with considerable accuracy how long it will take to process a batch of 1,000 claims because it does similar work every day. When several equally important jobs are started in the input queue, the scheduler should use the shortest first job algorithm

As shown in figure A, there are four jobs A, B, C, and D, with running times of 8, 4, 4, and 4 minutes respectively. If running in the order shown in the figure, the turnaround time of A is 8 minutes, B is 12 minutes, C is 16 minutes, D is 20 minutes, and the average time is 14 minutes.

Now consider using the shortest job first algorithm to run four jobs, as shown in Figure B. The current turnaround time is 4, 8, 12 and 20 respectively, with an average of 11 minutes, which can prove that shortest job first is optimal. Consider the case where there are four jobs with running times a, B, C, and D respectively. The first job ends at time A, the second at time A + B, and so on. The average turnaround time is (4A + 3b + 2C + d) / 4. Obviously A has the greatest impact on the average, so A should be the shortest priority job, followed by B, then C, and then D and it can only affect its own turnaround time.

It is important to note that the shortest job first algorithm is optimal when all processes are running.

Minimum remaining time is preferred

The Shortest job first preemptive version is known as the Shortest Remaining Time Next algorithm. With this algorithm, the scheduler always selects the process with the shortest remaining running time to run. When a new job arrives, its total time is compared to the remaining time of the current process. If the new process takes less time than the current running process, the current process is suspended and the new process is run. In this way, short-term operations can be well served.

Scheduling in interactive systems

Interactive systems are so common in PCS, servers, and other systems that it’s worth exploring interactive scheduling

Polling scheduling

One of the oldest, simplest, fairest and most widely used algorithms is the round robin algorithm. Each process is assigned a period of time, called a quantum, within which the process is allowed to run. If the process is still running when the time slice ends, it preempts one CPU and allocates it to another process. If the process blocks or ends before the time slice ends, the CPU switches immediately. The polling algorithm is relatively easy to implement. What the scheduler does is maintain a list of runnable processes, like a in the image below, that are moved to the end of the queue when a process runs out of time, like B in the image below.

The only interesting aspect of time slice polling scheduling is the length of time slice. Switching from one process to another takes time for administrative processing, including saving register values and memory maps, updating different tables and lists, and clearing and reloading the memory cache. These switches are called process switch and context switch. If it takes 1ms to switch between processes, including memory mapping, cleaning, reloading the cache, and so on, and assuming a time slice of 4 ms, it will take the CPU 1ms to switch between processes after 4 ms of useful work. As a result, 20% of the CPU’s time slice is wasted on administrative overhead. It’s expensive.

To improve CPU efficiency, we set the time slice to 100 ms. Now only 1% of time is wasted. But consider what happens when 50 requests arrive in a very short period of time with different CPU requirements. All 50 processes are placed in the runnable process list. If CP painting U is idle, the first process will start immediately, the second will not start until 100 ms later, and so on. Unfortunately, the last process had to wait 5 seconds to get a chance to execute. Most users will find running for 5 seconds to be slow for a short command. This is bad design if some request at the end of the queue takes only a few seconds to run.

Another factor is that preemption will occur less often if the time slice set length is larger than the CPU usage length. In contrast, most processes are blocked before the time slice runs out, causing interprocess switching. Eliminating preemption improves performance because process switching occurs only when it is logically necessary, that is, when the process is blocked and cannot continue.

The conclusion can be stated as follows: Setting the context switch time too short will result in too many process switches and reduced CPU efficiency, but setting the context switch time too long will result in a short request going unanswered for too long. The best switching time is between 20 and 50 milliseconds.

Priority scheduling

Polling scheduling assumes that all processes are equally important. But that may not be the case. For example, in a university hierarchy, first the dean, then the professors, secretaries, support staff, and finally the students. This consideration of external circumstances enables priority scheduling

The basic idea is clear: each process is assigned a priority, and the process with the highest priority runs first.

This does not mean that high-priority processes can run forever, however, as the scheduler reduces the priority of the currently running process during each clock interrupt. If this action causes its priority to drop below that of the next highest process, a process switch occurs. Alternatively, each process can be assigned a maximum interval of time allowed to run. When the interval runs out, the next higher-priority process gets a chance to run.

Processes can be assigned priorities statically or dynamically. On a military computer, you can set the process initiated by a general to priority 100, a colonel to priority 90, a major to priority 80, a captain to priority 70, a lieutenant to priority 60, and so on. There is a UNIX command called nice that allows users to voluntarily lower the priority of their processes in favor of others, but it is generally not used.

Priorities can also be dynamically assigned by the system for certain purposes. For example, some processes are I/O intensive and spend most of their time waiting for the I/O to finish. When such a process needs a CPU, it should be allocated immediately to start the next I/O request so that I/O operations can be performed while another process performs its calculations. Such AN I/ O-intensive process waiting for the CPU for a long time only causes it to consume memory for a long time. A simple algorithm to better serve an I/ O-intensive process is to set its priority to 1/f, where F is the portion of the process in the last slice of time. A process that uses only 1 ms in a 50 ms slice gets priority 50, a process that uses 25 ms before blocking gets priority 2, and a process that uses the entire slice gets priority 1.

It is convenient to divide a group of processes into several classes according to their priorities, and use priority scheduling between the classes, and use rotation scheduling within the various processes. A system of four priority classes is shown below

Its scheduling algorithm is mainly described as follows: There are runnable processes of priority 4, and a time slice is run for each process according to the rotation method. At this time, processes of lower priority are ignored. If the class 4 process is empty, the class 3 process is run in a polling manner. If both class 4 and class 3 processes are empty, the class 2 process is run as a rotation. If priorities are not adjusted, low-priority processes are prone to starvation.

Multi-level queue

The earliest System to use priority scheduling was CTSS(Compatible TimeSharing System). One problem with CTSS, a compatible time-sharing system, is that process switching is slow because IBM 7094 memory can only fit into one process.

IBM was the computer of Columbia University Computer Center from 1964 to 1968

CTSS needs to swap the current process out to disk and read a new process from disk before each switch. CTSS designers quickly realized that it was more efficient to set long time slices for CPU-intensive processes than to frequently give them short time slices (reducing the number of swaps). On the other hand, as mentioned earlier, the response time can be affected by long chip processes, and the solution is to set priority classes. The process with the highest priority runs one time slice, the process with the next highest priority runs two time slices, the process with the next highest priority runs four time slices, and so on. When a process runs out of allocated time slices, it is moved to the next category.

Shortest process first

For batch systems, since minimum job first is often accompanied by minimum response times, it would be nice to be able to use it for interactive processes. To some extent, this can be done. Interactive processes typically follow the following pattern: wait for command, execute command, wait for command, execute command… If we treat each command execution as a separate job, we can minimize response times by running the shortest job first. The only problem here is how to find the shortest of the currently runnable processes.

One way is to make assumptions based on the process’s past behavior and execute the one with the shortest estimated running time. Assuming that the estimated running time of each command on each terminal is T0, now assuming that its next run time is measured to T1, the estimated time can be improved by weighting two values, namely aT0+ (1-1)T1. By choosing the value of A, you can decide whether to forget the old running times as soon as possible or to keep them in mind for a long time. When a is equal to 1/2, you get the following sequence

As you can see, after three rounds, T0’s share of the new estimate has dropped to 1/8.

This technique of taking a weighted average of a current measurement and a previous estimate to get the next one is sometimes called aging. This method uses a lot of predictive values based on the current values.

Ensure that scheduling

A completely different scheduling approach is to make explicit performance guarantees to users. A practical and easily implemented guarantee is that if a user is working with n users logged in, each user will get 1/n of the CPU processing power. Similarly, in a single-user system with n processes running, if all processes are equivalent, each process will get 1/n of CPU time.

Lottery scheduling

Making promises to users and then delivering on them is a good thing, but hard to do. But there is a simple way, there is an algorithm that can not only give the prediction results but also has a relatively simple way of implementation, is lottery scheduling algorithm.

The basic idea is to provide a lottery of various system resources, such as CPU time, for processes. When a scheduling decision is made, a lottery ticket is randomly drawn, and the process that owns the lottery ticket gets the resource. When applied to CPU scheduling, the system can hold up to 50 sweepstakes per second, with each winner receiving, say, 20 milliseconds of CPU time as a reward.

George Orwell said that all processes are equal, but some processes are more equal than others. Some important processes can give them extra lottery tickets to increase their chances of winning. If 100 tickets are sold and one process holds 20 of them, it has a 20% chance of winning the ticket. In a long run, it gets 20% of the CPU. In contrast, for priority schedulers, it is difficult to say exactly what it means to have priority 40. The rule here is clear: processes with lottery F shares get approximately F shares of system resources.

You can exchange tickets between processes if you want them to collaborate. For example, if a client process blocks after sending a message to the server process, the client process might give the server all its tickets to increase the chances of the server running again. When the service is complete, it returns the lottery ticket to the client so it can run again. In fact, if there were no clients, the server would not need the lottery at all.

Think of the lottery as a buff, which gives you a 15% chance to produce speed boots.

Fair share scheduling

So far, we have assumed that each process is being scheduled, regardless of who owns the process. As a result, if user 1 starts nine processes and user 2 starts one, using rotation or the same priority scheduling algorithm, user 1 will get 90% of the CPU time and user 2 will get 10% of the CPU time.

To prevent this, some systems take process owners into account before scheduling. In this model, each user allocates some CPU time, and the scheduler selects processes and forces them to execute. So if two users are each guaranteed a 50% CPU slice, then no matter how many processes a user has, they will get the same CPU share.

Scheduling in real-time systems

Real-time system is a system in which time plays an important role. Typically, one or more external physical devices send a service request to the computer, and the computer must respond appropriately within a defined time frame. For example, a computer in a CD player would take a bit stream from the drive and then have to convert the bit stream to music in a very short amount of time. If the calculation takes too long, the music will sound abnormal. Take patient monitoring devices in intensive care units in hospitals, autopilot systems in airplanes, smoke warning devices in trains, and so on. In these cases, a correct but slow response is worse than no response at all.

Real-time systems can be divided into two categories: hard real time systems, which must meet absolute deadlines, and soft real time systems, which must meet absolute deadlines. The latter means you don’t want to miss deadlines occasionally, but you can tolerate them. In both cases, real-time is achieved by dividing the program into a group of processes, each of which is predictable and known in advance. These processes tend to be short-lived and run extremely quickly. When an external signal is detected, the scheduler’s job is to schedule the process as long as all deadlines are met.

Events in a real-time system can be further classified by response as either periodic (occurring at regular intervals) or non-periodic (occurring at unpredictable times) events. A system may respond to multiple periodic streams of events, and depending on how long it takes to process each event, it may not even be able to process all of them. For example, if there are m cycle events, event I occurs at cycle Pi, and it takes Ci seconds of CPU time to process an event, then the load can be handled under the condition that

Only real-time systems that meet this condition are called schedulable, which means that it can actually be implemented. A process that does not meet this test cannot be scheduled because the total amount of CPU time required by these processes is greater than the CPU can provide.

As an example, consider a soft real-time system with three periodic events, with periods of 100 ms, 200 m, and 500 ms. If these events require 50 ms, 30 ms, and 100 ms of CPU time, respectively, then the system is schedulable because 0.5 + 0.15 + 0.2 < 1. If a fourth event is added at this time and its period is 1 second, then this event can still be scheduled if it does not exceed 150 ms. Ignores the time of the context switch.

The scheduling algorithm of real-time system can be static or dynamic. The former makes scheduling decisions before the system starts to run; The latter makes scheduling decisions during operation. Static scheduling works only if you can know in advance what work is being done and what deadlines must be met, while dynamic scheduling does not require these limitations.

Scheduling policies and mechanisms

So far, we have implicitly assumed that all processes in the system belong to different groupings of users and compete with each other for CPUS. This is usually true, but sometimes a process can have many children and run under their control. For example, a DATABASE management system process may have many child processes. Each child process may handle different requests, or each child process may perform different functions (such as request analysis, disk access, and so on). It is entirely possible for the main process to control which subprocesses are the most important (or urgent) and which are the least important. However, none of the scheduling algorithms discussed above receives the relevant scheduling decision information from the user process, which results in the scheduler rarely making the optimal choice.

The solution is to separate the scheduling mechanism from the scheduling policy, which has been a long-standing principle. This means that the scheduling algorithm is parameterized in some way, but the parameters can be filled in by the user process. Let’s consider the database example first. Suppose the kernel uses a priority scheduling algorithm and provides a system call for the process to set the priority. In this way, the parent process can control the details of how the child process is scheduled, even though it does not participate in the scheduling itself. The scheduling mechanism is located in the kernel, and the scheduling policy is determined by the user process. The separation of scheduling policy and mechanism is a key idea.

Thread scheduling

When several processes have multiple threads, there are two levels of parallelism: process and thread. Scheduling processing in such systems differs substantially, depending on whether user-level threads or kernel-level threads are supported (or both).

Let’s start with user-level threads. Since the kernel doesn’t know there are threads, the kernel does what it did before, taking A process, let’s say A, and giving it time slice control. The thread scheduler in A decides which thread to run. Let’s say A1. Since multiple threads do not have clock interrupts, the thread can run for as long as it wishes. If the thread has used up the entire time slice of the process, the kernel selects another process to continue running.

When process A finally runs again, thread A1 takes over. The thread continues to consume all of process A’s time until it completes its work. However, thread running does not affect other processes. The other processes get the appropriate share assigned by the scheduler, regardless of what happens inside process A.

Now consider the case where thread A does less work per CPU calculation, for example, 5 ms of work in 50 ms time. So, each thread runs for a while, and then hands the CPU back to the thread scheduler. So before the switch to the kernel process B, there will be a sequence A1, A2, A3, A1, A2, A3, A1, A2, A3, A1. As shown below.

The scheduling algorithm used by the runtime system can be any of the algorithms described above. For practical purposes, rotation scheduling and priority scheduling are more commonly used. The only limitation is the lack of a clock interrupt running too long a thread. However, this is usually not a problem because of the cooperative relationship between threads.

Now consider the case of using kernel threads, where the kernel selects a specific thread to run. It doesn’t care which process the thread belongs to, but it can if necessary. A time slice is assigned to the selected thread, and if the time slice is exceeded, the thread is forced to be suspended. A thread would block after 5 ms in a 50 ms time slice, and in a 30 ms time slice the threads would be in the order A1,B1,A2,B2,A3,B3. As shown in the figure below

The main difference between user-level threads and kernel-level threads is performance. While user-level thread switching requires a small number of machine instructions (think of thread switching in a Java program), kernel threads require a full context switch, modifying the memory image and invalidating the cache, resulting in orders of magnitude of delay. On the other hand, when using kernel-level threads, once the thread is blocked on I/O, there is no need to suspend the entire process as in user-level threads.

The cost of switching from one thread in process A to one thread in process B is much higher than the cost of running two threads in process A (which involves modifying the memory image, modifying the cache), and the kernel knows the cost of switching and can use this information to make A decision.

Article Reference:

Modern Operating Systems

Modern Operating System Forth Edition

www.encyclopedia.com/computing/n…

J00ru.vexillium.org/syscalls/nt…

www.bottomupcs.com/process_hie…

En.wikipedia.org/wiki/Runtim…

En.wikipedia.org/wiki/Execut…

Zhidao.baidu.com/question/11…

baike.baidu.com/item/ Waiting queue /9…

www.columbia.edu/cu/computin…

baike.baidu.com/item/ interrupt vector /4…