This article originated from personal public account: TechFlow, original is not easy, for attention


Today is the 20th installment of our Python series on multithreading in Python.

There are many other uses for metaclasses, such as how to set parameters in metaclasses, and some conventions. It’s just that these are very niche, very infrequent uses, so we’re not going to elaborate on them, but we can go into more detail when we use them. I think once you understand how metaclasses work and how to use them, it should be easy to learn how to use them. So we’re starting a new topic today — multithreading and concurrency.

Processes and threads

Let’s talk briefly about processes and threads in the interest of whiteness. These two concepts belong to the operating system, and we hear about them a lot, but perhaps few of us have looked into what they mean. It is important for engineers to understand the definition and difference between the two.

Let’s start with processes, which can be thought of as specific tasks performed by the CPU. In an operating system, the CPU runs much faster than the rest of the computer because it runs so fast. For example, memory, disk, etc., so if the CPU only performs one task at a time, it will cause the CPU to spend a lot of time waiting for these devices, which will be inefficient operation. In order to increase the efficiency of the computer and squeeze as much skill out of the machine as possible, the CPU polling work. That is, it performs one task at a time, performing a fragment for a short time and then switching to another task.

So in the early days of single-core machines, computers seemed to work concurrently. We can listen to music while surfing the Internet without feeling stuck. But in reality, this is the result of CPU polling. In this example, the software that listens to music and the software that accesses the Internet are separate processes for the CPU. We can simply understand processes as running applications. For example, in Android phones, when an APP is started, it corresponds to a process in the system. Of course, this is not entirely accurate, an application can also start multiple processes.

Processes are CPU specific, threads are more program specific. Even when the CPU is executing the current process, there is a division of tasks that the program runs. For example, in the music listening software, we need to display the subtitle of the lyrics, play the sound and monitor the user’s behavior, such as whether the song is cut, adjust the volume and so on. Therefore, we need to further split up the CPU’s work so that it continues polling to do multiple things at once while executing the current process.

A task in a process is a thread, so process and thread are contained in this sense. A process can contain multiple threads. For the CPU, threads cannot be executed directly. A thread must belong to a process. So we know that CPU process switching switches the application or software that is executing, whereas internal threads switching switches the specific tasks that are executing in the software.

A classic model of the relationship between processes and threads assumes that the CPU is a factory with multiple workshops. Different workshops correspond to different production tasks, some workshops produce car tires, some workshops produce car frames. But the power of the plant is limited and can only be used by one plant at a time.

In order to coordinate everyone’s progress, each factory needs to provide power supply for each workshop in turn. The workshop here corresponds to the process.


Although a workshop produces only one product, but there is more than one process. A workshop may have several assembly lines. The specific production tasks are actually completed by assembly lines, and each assembly line corresponds to a specific task to be executed. However, the workshop can only execute one assembly line at a time, so we need the workshop to switch power supply between these assembly lines to make the production schedule of each assembly line uniform.


The natural counterpart of the assembly line in the workshop is the concept of threads, which well illustrates the relationship between CPU, process and threads. This is true in practice, but the situation in a CPU is much more complicated than in a real workshop. For both processes and cpus, the situation they face changes in real time. There are x lines in the workshop, and the next moment there may be y.

Once you understand the concepts of threads and processes, it is also helpful to understand the configuration of your computer. For example, we buy a computer, often encounter a term, is the computer’s CPU is so-and-so core so-and-so thread. For example, the first laptop I bought had four cores and eight threads. It actually meant that the CPU of this computer had four computing cores, but it used hyperthreading technology, so that one physical core could be simulated into two logical cores. So we could have four cores executing eight threads at the same time, eight cores executing at the same time, but actually four cores are simulated virtual cores.

One question is why four cores with eight threads and not four cores with eight processes? This is because the CPU does not execute the process directly, but rather one thread within the process. Just like the workshop can’t produce parts directly, only the assembly line can produce parts. Workshop is more responsible for the allocation of resources, so there is a very classic words in the textbook to interpret: process is the smallest unit of resource allocation, thread is the smallest unit of CPU scheduling.

Starting a thread

Python provides us with the threading library, which makes it very easy to create threads to perform multithreading.

First, we introduced Thread from threading, which is a Thread class that can execute multiple threads by creating an instance of a Thread.

from threading import Thread
t = Thread(target=func, name='therad', args=(x, y))
t.start()
Copy the code

To explain how it works, we pass in three parameters, target, name, and args, which we can guess from their names. The first is target, which passes in a method that we want to multithread. Name is the name we give the newly created thread. This parameter can be omitted, and if omitted, the system will give it a system name. When we execute Python, we start a thread named MainThread, which we can tell by name. Args is the argument passed to the target function.

Let’s take a classic example:

import time, threading

# new thread execution code:
def loop(n):
    print('thread %s is running... ' % threading.current_thread().name)
 for i in range(n):  print('thread %s >>> %s' % (threading.current_thread().name, i))  time.sleep(5)  print('thread %s ended.' % threading.current_thread().name)  print('thread %s is running... ' % threading.current_thread().name) t = threading.Thread(target=loop, name='LoopThread', args=(10)),t.start() print('thread %s ended.' % threading.current_thread().name) Copy the code

We created a very simple loop function that executes a loop to print a number. Each time we print a number, the thread sleeps for 5 seconds, so we should see an extra row of numbers on the screen every 5 seconds.

Let’s do it in Jupyter:


On the face of it, there’s nothing wrong with that, but there’s a problem. What’s the problem? The output is not in the right order. Why does the main thread end after we print the first digit 0? Another question is, if the main thread has ended, why is the Python process not finished and still printing out results?

Because threads are independent, the main thread does not stop after t.start(), but continues until the end. If we do not want the main thread to end at this point, but instead block and wait for the child thread to finish, we can do this by adding a t.in () line to our code.

t.start()
t.join()
print('thread %s ended.' % threading.current_thread().name)
Copy the code

Join allows the main thread to wait at the join until the child thread finishes executing before continuing. When we add the join, the result looks like this:


This is what we expect, waiting for the child thread to finish executing before continuing.

Let’s look at the second question, why does the child thread continue to run when the main thread ends and the Python process does not exit? This is because by default we create all user-level threads, and in the case of a process, we wait for all user-level threads to finish executing before exiting. Here’s the problem: If we create a thread that tries to fetch data from an interface, won’t the current process wait forever because the interface never returns?

This is obviously not reasonable, so to solve this problem, we can set the created thread as a daemon thread.

Daemon thread

Daemons are daemon threads, which are simply daemons, so we can also say background threads. Daemon threads are different from user threads. Processes do not actively wait for daemon threads to execute and exit when all user threads finish. All daemon threads are killed when the process exits.

We passed the daemon=True argument to set the created thread to background thread:

t = threading.Thread(target=loop, name='LoopThread', args=(10, ), daemon=True)
Copy the code

This is what we see when we execute:


One thing to note here is that if you run jupyter you don’t see this result. As Jupyter itself is a process, as for the cells in Jupyter, it is always alive with user-level threads, so the process will not exit. So to see this effect, you have to execute Python files from the command line.

If we want to wait for the child thread to terminate, we must use the join method. In addition, to prevent the child thread from being locked and unable to exit, we can also set a timeout in joih, which is the maximum waiting time. When the waiting time is reached, we will no longer wait.

For example, when I set timeout to 5 in the Join, only 5 numbers will be printed on the screen.


In addition, setting timeout is useful if the background thread is not set, but the process will still wait for all child threads to finish. So the output on the screen would look something like this:


Although the main thread continues and terminates, the child thread continues to run until the child thread also completes.

For the join set timeout there is a pit if we only have one thread to wait, but if we have multiple threads, we use a loop to set them to wait. The main thread waits for N times out, where N is the number of threads. Because each thread starts calculating whether or not it timed out at the time the previous thread timed out ended, it waits for all threads to time out before terminating them altogether.

For example, I create 3 threads like this:

ths = []
for i in range(3) :    t = threading.Thread(target=loop, name='LoopThread' + str(i), args=(10, ), daemon=True)
    ths.append(t)

 for t in ths:  t.start()   for t in ths:  t.join(2) Copy the code

The final output on the screen looks like this:


All threads survived for six seconds, which, I have to say, was a bit of a screw-up, not at all what we expected.

conclusion

In today’s article, we took a brief look at the concepts of threads and processes in the operating system, how to create a thread in Python, and how to use it after creating a thread. Today’s introduction is just the most basic usage and concept, but there are many more advanced uses of threads that we will share in a future article.

Multithreading is critical in many languages and will be used in many scenarios. There’s the Web back end, there’s crawlers, there’s game development and everything else that involves developing UI interfaces. When it comes to the UI, one thread is required to render the page alone, and another thread is responsible for preparing the data and executing the logic. Therefore, multithreading is a topic that professional programmers cannot avoid, but also one of the content that must be mastered.

This is the end of today’s article, if you like this article, if you can, please click a follow, give me a little encouragement, but also convenient to get more articles.

This article is formatted using MDNICE