Processes, Threads, and coroutines. This article explains the following from operating system principles and code practices:

What are processes, threads and coroutines? What is the relationship between them? Why is multithreading in Python pseudo-multithreading? How to select the technical solution for different application scenarios? .

What is a process

Process – An abstract concept provided by an operating system. It is the basic unit for resource allocation and scheduling. It is the foundation of the operating system structure. A program is a description of instructions, data and its organizational form, and a process is the entity of a program. The program itself has no life cycle, it is just some instructions on disk, once the program is run is a process.

When a program need to run the operating system will be code and all the static data recorded into memory and processes of address space (each process has a unique address space, as shown below), by creating and initializing the stack (local variables and function parameters and the return address), distribution, heap memory and I/o related tasks, the current phase of the preparatory work completed, start the program, The OS transfers control of the CPU to the newly created process, and the process starts running.

The operating system controls and manages processes through a Processing Control Block (PCB). A PCB is a continuous storage area in the system memory. It stores all the information required by the operating system to describe processes and Control their running (including: Process id, process status, process priority, file system pointer, and the contents of various registers), and a process’s PCB is the only entity that the system is aware of.

A process has at least five basic states: initial state, ready state, wait (blocked) state, execution state, and termination state.

Initial state: The process has just been created, but cannot be executed because other processes are occupying CPU resources. Ready state: Only scheduled processes that are in the ready state can move to execution state Wait state: Process waiting for an event to complete Execution state: Only one process (for a single CPU) can be in execution state at any time. Stopped: The process is terminated

Switching between processes

In both multi-core and single-core systems, a CPU appears to be executing multiple processes concurrently, by switching processors between processes. The mechanism used by the operating system to exchange CPU control between different processes is called a context switch. It saves the context of the current process, restores the context of the new process, and then transfers CPU control to the new process. The new process will start where it stopped last time. Thus, processes use the CPU in turn, and the CPU is shared by several processes, using some scheduling algorithm to decide when to stop one process and service the other instead.

Dual-process single-core CPU

Processes take turns using CPU resources by context switching based on specific scheduling mechanisms and situations such as I/O interrupts

Dual-core CPUS have dual processes

Each process monopolizes one CPU core resource, and the CPU is blocked while processing I/O requests

Data sharing between processes

Processes in the system share CPU and main memory resources with other processes. In order to better manage main memory, the operating system provides an abstract concept of main memory, which is called virtual memory (VM). It is also an abstract concept that provides each process with the illusion that it is using main memory exclusively.

Virtual storage provides three main capabilities:

By treating main memory as a cache stored on disk, holding only active areas in main memory and moving data back and forth between disk and main memory as needed, you can more efficiently use main memory to provide a consistent address space for each process. Since the process has its own exclusive virtual address space, the CPU converts the virtual address into a real physical address through address translation. Each process can only access its own address space. Therefore, data cannot be shared between processes without the aid of another mechanism (inter-process communication)

Take Multiprocessing in Python as an example:

import multiprocessing import threading import time n = 0 def count(num): global n for i in range(100000): n += i print("Process {0}:n={1},id(n)={2}".format(num, n, id(n))) if __name__ == '__main__': start_time = time.time() process = list() for i in range(5): # p = threading.thread (target=count, args=(I,)) # p = threading.thread (target=count, args=(I,)) # p = threading.thread (target=count, Args =(I,)) # Test multithreading using process.append(p) for p in process: p.join() print("Main:n={0},id(n)={1}".format(n, id(n))) end_time = time.time() print("Total time:{0}".format(end_time - start_time))Copy the code

The results of

Process 1:n=4999950000,id(n)=139854202072440 Process 0:n=4999950000,id(n)=139854329146064 Process 2:n=4999950000,id(n)=139854202072400 Process 4:n=4999950000,id(n)=139854201618960 Process 3: n = 4999950000, id (n) = 139854202069320, Main: n = 0, id (n) = 9462720 Total time: 0.03138256072998047Copy the code

The variable n has a unique address space in both process p{0,1,2,3,4} and the main process (main)

What is a thread

Thread – also an abstraction provided by the operating system. It is a single sequential control process in program execution. It is the smallest unit of program execution flow and the basic unit of processor scheduling and dispatch. A process can have one or more threads, and multiple threads in the same process will share all system resources in the process, such as virtual address space, file descriptors, signal handling, and so on. However, multiple threads in the same process have their own call stacks and thread-local storage (as shown in the figure below).

The system uses PCB to complete the process control and management. Similarly, the system allocates a Thread Control Block (TCB) to the Thread. All the information used to Control and manage the Thread is recorded in the Thread’s Control Block. The TCB usually includes:

Thread identifier A set of registers a thread’s running status priority a thread’s proprietary memory signal mask

Threads, like processes, also have at least five states: initial, ready, wait (blocked), execute, and terminate

Switching between threads, like processes, also requires a context switch, which is not covered here.

There are many similarities between processes and threads, but what is the difference between them? Process VS Thread

A process is an independent unit of resource allocation and scheduling. Processes have a complete virtual address space. When a process switchover occurs, different processes have different virtual address Spaces. While multiple threads of the same process share the same address space (threads between different processes cannot be shared) threads are the basic unit of CPU scheduling, a process contains several threads (at least one thread). Threads are smaller than processes and have virtually no system resources. Threads can be created and destroyed in much less time than processes because threads can share the address space. Therefore, synchronization and mutex operations need to be considered. The accidental termination of one thread can affect the normal operation of the entire process, but the accidental termination of one process does not affect the operation of other processes. Therefore, multi-process programs are more secure. In short, multi-process program has high security, high process switching cost and low efficiency. Multithreaded program has high maintenance cost, low thread switching cost and high efficiency. (Multithreading in Python is pseudo-multithreading, more on this later)

What is a coroutine

Coroutine (also known as a micro thread) is a more lightweight existence than threads. Coroutines are not managed by the operating system kernel, but are completely controlled by the program. Coroutines relate to threads and processes as shown in the figure below.

Coroutines can be compared to subroutines, but in the process of execution, the subroutine can be interrupted inside, and then turn to execute other subroutines, at the appropriate time to return to continue the execution. Switching between coroutines does not need to involve any system calls or any blocking calls. Coroutines execute only in one thread. It is a switching between subroutines that occurs in user mode. And thread blocking state is performed by the operating system kernel, occurred in kernel mode, thus coroutines thread saves thread creation and switching overhead than coroutines does not exist in the write variables at the same time, as a result, it is not need to be guarding the key block of synchronization primitives, such as mutexes, semaphores, etc., and does not require support from the operating system.

Coroutines are suitable for IO blocking and require a large number of concurrent scenarios. When I/O blocking occurs, the coroutine scheduler will yield the data flow and record the data on the current stack. After blocking, the coroutine stack will be restored through the thread immediately and the blocked result will be run on the thread.

Below, we will analyze how to choose to use Processes, threads, and coroutines in Python in different application scenarios.

How to choose?

Before comparing the differences for different scenarios, it’s important to introduce Python’s multithreading (which programmers have long criticized as “fake” multithreading).

So why is multithreading in Python “pseudo” multithreading?

Instead of the multiprocessing example above, Process(target=count, args=(I,))) = p = threading.thread (target=count, args=(I,)));

In order to reduce code redundancy and article length, naming and printing irregularities should be ignored

Process 0:n=5756690257,id(n)=140103573185600 Process 2:n=10819616173,id(n)=140103573185600 Process 1:n=11829507727,id(n)=140103573185600 Process 4:n=17812587459,id(n)=140103573072912 Process 3: n = 14424763612, id (n) = 140103573185600, Main: n = 17812587459, id (n) = 140103573072912 Total time: 0.1056210994720459Copy the code

N is a global variable, and the print result of Main is equal to that of threads, proving that data is shared between threads

But why does multiple threads take longer than multiple processes? This is grossly inconsistent with the fact that we said above (thread overhead << process overhead). This is where Cpython’s Global Interpreter Lock (GIL) comes in.

What is a GIL

The GIL was originally designed for Python, a decision made for data security (due to reference counting in the memory management mechanism). If a thread wants to execute, it must first get the GIL. Therefore, you can think of the GIL as a “pass,” and in a Python process, there is only one GIL, and threads that do not get the pass are not allowed to enter the CPU for execution. The Cpython interpreter uses reference counting in memory management, and when the number of references to an object is zero, it is garbage collected. (For more on memory management in Python, see interview Requirements: Memory Management in Python.) Consider this scenario:

A process contains two threads, thread 0 and thread 1. Both threads refer to object A.

When two threads make a reference to a at the same time (no modification, no need to use the synchronization primitive), the reference counter of object A is modified at the same time, causing the reference count to be less than the substantial reference and causing a memory exception error when garbage collection is performed. Therefore, a global lock (i.e. GIL) is needed to ensure that object reference counts are correct and secure.

Whether single-core or multi-core, a process can only execute one thread at a time (only the thread with the GIL can execute, as shown in the figure below), which is the root cause of Python’s low multithreading performance on multi-core cpus.

So if you have concurrency requirements in Python, you can just use multiple processes and be done with it? In fact, there is a saying in software engineering: there is no silver bullet!

When to use?

There are three common application scenarios:

CPU intensive: the program needs a lot of CPU computing and data processing; I/O intensive: Programs require frequent I/O operations. For example, the network socket data transmission and reading; CPU intensive +I/O intensive: The combination of the two is CPU intensive and can be compared to the above example from Python’s multiprocessing and threading: Multiprocess performance > Multithreading performance.

Here’s a brief explanation of the I/O intensive situation. The operating system’s most common solution for interacting with I/O devices is DMA.

What is the DMA

Direct Memory Access (DMA) is a special device in the system. It can coordinate data transfer between Memory and devices without CPU intervention. Take file writing as an example:

P1 sends a request for data to be written to a disk file. The CPU processes the write request. The DMA engine is programmed to tell the DMA engine where the data is in memory, the size of the data to be written to, and the target device. Interrupt the CPU CPU to switch from context P2 to P1 and continue p1

Python Multithreaded performance (I/O intensive)

Thread Thread0 receives an I/O request and forwards it to the DMA. DMA executes the request using Thread1, which occupies CPU resources and continues to execute the interrupt request received from DMA. The CPU switches to Thread0 to continue execution

Similar to the process execution mode, it makes up for the defects caused by the GIL, and because the overhead of threads is much smaller than that of processes, the performance of multithreading is higher in IO intensive scenarios

Practice is the only test of truth, and the next test is for I/O intensive scenarios.

test

Execute the code

import multiprocessing import threading import time def count(num): Time. Sleep (1) # # simulation IO operations print (" Process {0} the End ". The format (num)) if __name__ = = "__main__ ': start_time = time.time() process = list() for i in range(5): p = multiprocessing.Process(target=count, args=(i,)) # p = threading.Thread(target=count, args=(i,)) process.append(p) for p in process: p.start() for p in process: p.join() end_time = time.time() print("Total time:{0}".format(end_time - start_time))Copy the code

The results of

Multiple processes

Process 0 End Process 3 End Process 4 End Process 1 End Total time:1.383193016052246 ## Multi-threaded Process 0 End Process 1 End Process 2 End Total time:1.003425121307373Copy the code

As mentioned above, for I/ O-intensive programs, coroutines are more efficient because they are controlled by the program itself, which saves the overhead of creating and switching threads.

Use async/await syntax to create and use coroutines that depend on the Asyncio concurrent code base in Python. The program code

Import time import asyncio async def coroutine(): await asyncio.sleep(1) start_time = time.time() loop = asyncio.get_event_loop() tasks = [] for i in range(5): task = loop.create_task(coroutine()) tasks.append(task) loop.run_until_complete(asyncio.wait(tasks)) loop.close() end_time = time.time() print("total time:", end_time - start_time)Copy the code

The results of

Total time: 1.001854419708252Copy the code

Coroutines perform better than multithreading

conclusion

This article explains processes, threads, and coroutines and their relationships from the perspective of operating system principles and code practices. In addition, I summarized and sorted out how to choose the corresponding solution for different scenarios in Python practice, as follows:

CPU intensive: multi-process IO intensive: multi-threaded (coroutines are expensive to maintain and do not increase the efficiency of reading and writing files significantly) CPU intensive and IO intensive: multi-process + coroutines

Does more articles and materials | click behind the text to the left left left 100 gpython self-study data package Ali cloud K8s practical manual guide] [ali cloud CDN row pit CDN ECS Hadoop large data of actual combat operations guide the conversation practice manual manual Knative cloud native application development guide OSS Operation and maintenance manual Cloud native architecture white paper Zabbix enterprise distributed monitoring system source document Cloud native basic handbook 10G