Based on operating system principles and code practices, this paper explains the following contents:
- What are processes, threads and coroutines?
- What is the relationship between them?
- Why is multithreading in Python pseudo-multithreading?
- How to select the technical solution for different application scenarios?
- .
What is a process
Process – An abstract concept provided by an operating system. It is the basic unit for resource allocation and scheduling. It is the foundation of the operating system structure. A program is a description of instructions, data and its organizational form, and a process is the entity of a program. The program itself has no life cycle, it is just some instructions on disk, once the program is run is a process.
When a program need to run the operating system will be code and all the static data recorded into memory and processes of address space (each process has a unique address space, as shown below), by creating and initializing the stack (local variables and function parameters and the return address), distribution, heap memory and I/o related tasks, the current phase of the preparatory work completed, start the program, The OS transfers control of the CPU to the newly created process, and the process starts running.
The operating system controls and manages processes through a Processing Control Block (PCB). A PCB is a continuous storage area in the system memory. It stores all the information required by the operating system to describe processes and Control their running (including: Process id, process status, process priority, file system pointer, and the contents of various registers), and a process’s PCB is the only entity that the system is aware of.
A process has at least five basic states: initial state, ready state, wait (blocked) state, execution state, and termination state.
- Initial state: The process has just been created, but cannot be executed because other processes are occupying CPU resources.
- Ready state: Only scheduled ones in the ready state can go to the execute state
- Wait state: The process is waiting for an event to complete
- Execution status: Only one process (for a single-core CPU) can be in execution at any time.
- Stopped: The process is terminated
Switching between processes
In both multi-core and single-core systems, a CPU appears to be executing multiple processes concurrently, by switching processors between processes.
The mechanism used by the operating system to exchange CPU control between different processes is called a context switch. It saves the context of the current process, restores the context of the new process, and then transfers CPU control to the new process. The new process will start where it stopped last time. Thus, processes use the CPU in turn, and the CPU is shared by several processes, using some scheduling algorithm to decide when to stop one process and service the other instead.
- Dual-process single-core CPU
Processes take turns using CPU resources by context switching based on specific scheduling mechanisms and situations such as I/O interrupts
- Dual-core CPUS have dual processes
Each process monopolizes one CPU core resource, and the CPU is blocked while processing I/O requests
Data sharing between processes
Processes in the system share CPU and main memory resources with other processes. In order to better manage main memory, the operating system provides an abstract concept of main memory, which is called virtual memory (VM). It is also an abstract concept that provides each process with the illusion that it is using main memory exclusively.
Virtual storage provides three main capabilities:
- Use main memory more efficiently by thinking of it as a cache stored on disk, holding only active areas in main memory and moving data back and forth between disk and main memory as needed
- Simplify memory management by providing a consistent address space for each process
- Protect the address space of each process from being damaged by other processes
Since a process has its own virtual address space, the CPU uses address translation to translate virtual addresses into real physical addresses. Each process can only access its own address space. Therefore, data cannot be shared between processes without the aid of another mechanism (inter-process communication)
- Take Multiprocessing in Python as an example:
import multiprocessing import threading import time n = 0 def count(num): global n for i in range(100000): n += i print("Process {0}:n={1},id(n)={2}".format(num, n, id(n))) if __name__ == '__main__': start_time = time.time() process = list() for i in range(5): # p = threading.thread (target=count, args=(I,)) # p = threading.thread (target=count, args=(I,)) # p = threading.thread (target=count, Args =(I,)) # Test multithreading using process.append(p) for p in process: p.join() print("Main:n={0},id(n)={1}".format(n, id(n))) end_time = time.time() print("Total time:{0}".format(end_time - start_time))Copy the code
- The results of
Process 1:n=4999950000,id(n)=139854202072440 Process 0:n=4999950000,id(n)=139854329146064 Process 2:n=4999950000,id(n)=139854202072400 Process 4:n=4999950000,id(n)=139854201618960 Process 3: n = 4999950000, id (n) = 139854202069320, Main: n = 0, id (n) = 9462720 Total time: 0.03138256072998047Copy the code
The variable n has a unique address space in both process p{0,1,2,3,4} and the main process (main)
What is a thread
Thread – also an abstraction provided by the operating system. It is a single sequential control process in program execution. It is the smallest unit of program execution flow and the basic unit of processor scheduling and dispatch. A process can have one or more threads, and multiple threads in the same process will share all system resources in the process, such as virtual address space, file descriptors, signal handling, and so on. However, multiple threads in the same process have their own call stacks and thread-local storage (as shown in the figure below).
The system uses PCB to complete the process control and management. Similarly, the system allocates a Thread Control Block (TCB) to the Thread. All the information used to Control and manage the Thread is recorded in the Thread’s Control Block. The TCB usually includes:
- Thread identifier
- A set of registers
- Thread running state
- priority
- Thread private storage
- Signal shielding
Threads, like processes, also have at least five states: initial, ready, wait (blocked), execute, and terminate
Switching between threads, like processes, also requires a context switch, which is not covered here.
There are many similarities between processes and threads, but what is the difference between them?
Process VS Thread
- A process is an independent unit of resource allocation and scheduling. Processes have a complete virtual address space. When a process switchover occurs, different processes have different virtual address Spaces. While multiple threads of the same process share the same address space (threads between different processes cannot share)
- Threads are the basic unit of CPU scheduling, and a process contains several threads (at least one thread).
- Threads are smaller than processes and have virtually no system resources. Threads are created and destroyed in much less time than processes
- Because threads can share the address space, synchronization and mutex operations need to be considered
- The accidental termination of one thread affects the normal running of the entire process, but the accidental termination of one process does not affect the running of other processes. Therefore, multi-process programs are more secure.
In short, multi-process program has high security, high process switching cost and low efficiency. Multithreaded program has high maintenance cost, low thread switching cost and high efficiency. (Multithreading in Python is pseudo-multithreading, more on this later)
What is a coroutine
Coroutine (also known as a micro thread) is a more lightweight existence than threads. Coroutines are not managed by the operating system kernel, but are completely controlled by the program. Coroutines relate to threads and processes as shown in the figure below.
- Coroutines can be compared to subroutines, but in the process of execution, the subroutine can be interrupted inside, and then turn to execute other subroutines, at the appropriate time to return to continue the execution. Switching between coroutines does not involve any system calls or any blocking calls
- Coroutines execute only in one thread, are switches between subroutines, and occur in user mode. Moreover, the blocking state of a thread is done by the operating system kernel and occurs in kernel mode, so coroutines save the overhead of thread creation and switching compared to threads
- There are no simultaneous write variable collisions in coroutines, so there is no need for synchronization primitives to guard critical blocks, such as mutex, semaphore, and so on, and no need for support from the operating system.
Coroutines are suitable for IO blocking and require a large number of concurrent scenarios. When I/O blocking occurs, the coroutine scheduler will yield the data flow and record the data on the current stack. After blocking, the coroutine stack will be restored through the thread immediately and the blocked result will be run on the thread.
Below, we will analyze how to choose to use Processes, threads, and coroutines in Python in different application scenarios.
How to choose?
Before comparing the differences for different scenarios, it’s important to introduce Python’s multithreading (which programmers have long criticized as “fake” multithreading).
So why is multithreading in Python “pseudo” multithreading?
Instead of the multiprocessing example above, Process(target=count, args=(I,))) = p = threading.thread (target=count, args=(I,)));
In order to reduce code redundancy and article length, naming and printing irregularities should be ignored
Process 0:n=5756690257,id(n)=140103573185600 Process 2:n=10819616173,id(n)=140103573185600 Process 1:n=11829507727,id(n)=140103573185600 Process 4:n=17812587459,id(n)=140103573072912 Process 3: n = 14424763612, id (n) = 140103573185600, Main: n = 17812587459, id (n) = 140103573072912 Total time: 0.1056210994720459Copy the code
- N is a global variable, and the print result of Main is equal to that of threads, proving that data is shared between threads
But why does multiple threads take longer than multiple processes? This is grossly inconsistent with the fact that we said above (thread overhead << process overhead). This is where Cpython’s Global Interpreter Lock (GIL) comes in.
What is a GIL
The GIL was originally designed for Python, a decision made for data security (due to reference counting in the memory management mechanism). If a thread wants to execute, it must first get the GIL. Therefore, you can think of the GIL as a “pass,” and in a Python process, there is only one GIL, and threads that do not get the pass are not allowed to enter the CPU for execution.
The Cpython interpreter uses reference counting in memory management, and when the number of references to an object is zero, it is garbage collected. (For more on memory management in Python, see interview Requirements: Memory Management in Python.) Consider this scenario:
A process contains two threads, thread 0 and thread 1. Both threads refer to object A.
When two threads make a reference to a at the same time (no modification, no need to use the synchronization primitive), the reference counter of object A is modified at the same time, causing the reference count to be less than the substantial reference and causing a memory exception error when garbage collection is performed. Therefore, a global lock (i.e. GIL) is needed to ensure that object reference counts are correct and secure.
Whether single-core or multi-core, a process can only execute one thread at a time (only the thread with the GIL can execute, as shown in the figure below), which is the root cause of Python’s low multithreading performance on multi-core cpus.
So if you have concurrency requirements in Python, you can just use multiple processes and be done with it? In fact, there is a saying in software engineering: there is no silver bullet!
When to use?
There are three common application scenarios:
- CPU intensive: the program needs a lot of CPU computing and data processing;
- I/O intensive: Programs require frequent I/O operations. For example, the network socket data transmission and reading;
- CPU dense +I/O dense: a combination of the two
For cpu-intensive cases, compare the above examples of Python’s multiprocessing and threading: Multiprocess performance > Multithreading performance.
Here’s a brief explanation of the I/O intensive situation. The operating system’s most common solution for interacting with I/O devices is DMA.
What is the DMA
Direct Memory Access (DMA) is a special device in the system. It can coordinate data transfer between Memory and devices without CPU intervention.
Take file writing as an example:
- Process P1 makes a request to write data to a disk file
- The CPU processes the write request, programmatically telling the DMA engine where the data is in memory, the size of the data to be written, and the target device
- The CPU handles requests from other processes, P2, and the DMA takes care of writing memory data to the device
- The DMA completes the data transfer and interrupts the CPU
- The CPU switches from context P2 to P1 and continues p1
Python Multithreaded performance (I/O intensive)
- Thread Thread0 executes first, thread Thread1 waits (for GIL to exist)
- Thread0 receives an I/O request, forwards it to DMA, and DMA executes it
- Thread1 consumes CPU resources and continues execution
- The CPU receives the DMA interrupt request and switches to Thread0 to continue execution
Similar to the process execution mode, it makes up for the defects caused by the GIL, and because the overhead of threads is much smaller than that of processes, the performance of multithreading is higher in IO intensive scenarios
Practice is the only test of truth, and the next test is for I/O intensive scenarios.
test
- Execute the code
import multiprocessing import threading import time def count(num): Time. Sleep (1) # # simulation IO operations print (" Process {0} the End ". The format (num)) if __name__ = = "__main__ ': start_time = time.time() process = list() for i in range(5): p = multiprocessing.Process(target=count, args=(i,)) # p = threading.Thread(target=count, args=(i,)) process.append(p) for p in process: p.start() for p in process: p.join() end_time = time.time() print("Total time:{0}".format(end_time - start_time))Copy the code
- The results of
End Process 1 End Process 2 End Process 1 End Total time:1.383193016052246 End Process 4 End Process 3 End Process 1 End Process 2 End Total time:1.003425121307373Copy the code
- Multithreaded execution performance is higher than that of multiple processes
As mentioned above, coroutines execute more efficiently for I/ O-intensive programs because they are controlled by the program itself, which saves the overhead of thread creation and switching.
Use async/await syntax to create and use coroutines that depend on the Asyncio concurrent code base in Python.
- The program code
Import time import asyncio async def coroutine(): await asyncio.sleep(1) start_time = time.time() loop = asyncio.get_event_loop() tasks = [] for i in range(5): task = loop.create_task(coroutine()) tasks.append(task) loop.run_until_complete(asyncio.wait(tasks)) loop.close() end_time = time.time() print("total time:", end_time - start_time)Copy the code
- The results of
Total time: 1.001854419708252Copy the code
- Coroutines perform better than multithreading
conclusion
This article explains processes, threads, and coroutines and their relationships from the perspective of operating system principles and code practices. In addition, I summarized and sorted out how to choose the corresponding solution for different scenarios in Python practice, as follows:
- CPU intensive: Multiple processes
- IO intensive: Multi-threaded (coroutines are expensive to maintain and have no significant improvement in reading and writing files)
- CPU intensive and IO intensive: multi-process + coroutine
Welcome to [pay attention] to me, let’s study and make progress together.
In addition, if you have any questions, please discuss in the comment section and communicate actively.
If this post helped you, don’t forget to “like” and “collect”, and refuse to reach out!