Multithreading in Python concurrent programming

This is the 12th day of my participation in Gwen Challenge

Wechat public number search [program yuan xiaozhuang], pay attention to the halfway program yuan how to rely on Python development to support the family ~

preface

This article introduces another important aspect of concurrent programming – threads.

The thread is introduced

We know that a program operation process is a process, each process in the operating system have an address space, and each thread default has a control process, for example, in a shop has a lot of raw materials processed products by the assembly lines, and thread is the line in the workshop, the workshop is the process, raw material is the data in memory, Each workshop has at least one assembly line.

Therefore, the process only concentrates resources (raw materials) together is the resource unit, while the thread is the specific execution unit of the CPU. There can be multiple threads in a process, which will share all the resources in the process. Therefore, the multi-threading within the same process will produce resource scrambling.

Why do you have multiple threads when you have multiple processes in order to achieve concurrency on a single core? Process is equivalent to a workshop, creating a process needs to create a workshop, and thread is the assembly line in the workshop, creating a thread is just to create a assembly line in the workshop without applying for additional memory space, the creation cost is much smaller than the process.

Enable the thread-threading module

The way to start a thread is basically the same as the way to start a process, but the modules used are different, and it is not necessary to open threads under if __name__ == ‘__main__’:, but it is generally recommended to open threads under if __name__ == ‘__main__’: branch.

from threading import Thread


def task() :
    print('i am sub_thread')

# Method 1: Instantiate Thread objects directly using Thread
if __name__ == '__main__':
    t = Thread(target=task)
    t.start()
    print('i am main-thread')
   

# Method 2: Inherit Therad, customize your own thread class, rewrite the run method
class MyThread(Thread) :
    def run(self) :
        task()

if __name__ == '__main__':
    t = MyThread()
    t.start()
    print('i am main-thread')
Copy the code

With the way to enable multi-threading, for the previous learning socket can achieve TCP server concurrency:

import socket
from threading import Thread

server = socket.socket()
server.bind(('127.0.0.1'.8080))
server.listen(5)

Task of thread: communication loop
def task(conn) :
    while 1:
        try:
            data = conn.recv(1024)
            if not data: break
            conn.send(data.upper())
        except Exception as e:
            print(e)
            break
    conn.close()


while True:
    conn, addr = server.accept()
    t = Thread(target=task, args=(conn, ))	When a connection request is made, a thread is opened to process the connection
    t.start()								# This concurrency has drawbacks: easy to run out of memory (limited number of threads)
Copy the code

The join method

There is also a join method in multithreading, where the main thread waits for the child thread to finish before executing the main thread

from threading import Thread
import time

class MyThread(Thread) :
    def __init__(self,name) :
        super().__init__()
        self.name = name

    def run(self) :
        print(f'{self.name} is run')
        time.sleep(2)
        print(f'{self.name} is over')

if __name__ == '__main__':
    t = MyThread('tank')
    t.start()
    t.join()
    print('the Lord')
Copy the code

Sharing data with multiple threads

We know that data between multiple processes will not be shared, but threads are created within a process. If there are multiple threads in a process, these threads will share all resource data within the process, as follows

from threading import Thread

money = 100
def foo() :
    global money
    money = Awesome!
    print('child',money)

if __name__ == '__main__':
    t = Thread(target=foo)
    t.start()
    print('the Lord',money)
    
    
# Program running results
子 Awesome!
主 Awesome!
Copy the code

Daemon thread

The main thread does not end immediately after it finishes running. It waits for all other child threads in the process to finish running, because when the main thread ends, it means that the process in which it is running terminates. When the process terminates, memory space is reclaimed and other child threads cannot work. So if the child thread is not a daemon thread, then the main thread will wait for the child thread to complete, but if the child thread is a daemon thread, then the main thread ends and the daemon thread ends.

from threading import Thread
import time

def foo() :
    print('The eunuch is free and easy! ')
    time.sleep(3)
    print('Old man dies.')


if __name__ == '__main__':
    t = Thread(target=foo)
    t.daemon = True
    t.start()
    print("My Lord is dead.")
    
# Run result indicates that the eunuch did not dieThe eunuch is at large! My Lord diedCopy the code

Thread mutex

Multiple threads share the resources in the unified process, so there will be data scrambling, resulting in data confusion, in order to prevent the occurrence of this phenomenon, can be locked, let multiple threads grab the lock, such as:

from threading import Thread, Lock
import time

money = 100
mutex = Lock()  Instantiate to get the lock

# Tasks that the thread needs to perform
def task() :
    global money
    mutex.acquire()  Thread acquires lock
    tmp = money
    time.sleep(0.01)
    money = tmp - 1
    mutex.release()  When the task is complete, the lock is released and other threads continue to scramble

if __name__ == '__main__':
    t_list = []
    for i in range(100):
        t = Thread(target=task)
        t.start()
        t_list.append(t)
    for t in t_list:
        t.join()  Wait for the child thread to complete before running the main thread
    print(money)
Copy the code

Deadlocks and recursive locks

When we were locked in said process is mentioned, lock cannot be used easily, it’s easy to have a deadlock phenomenon, whether process locking or thread lock are prone to deadlock, a deadlock is refers to two or more processes or threads in the process of execution, due to competition for resources of a kind of wait for each other, without external force, they will not be able to push down. The system is said to be in a deadlock state or the system has a deadlock. These processes that are always waiting for each other are called deadlock processes.

from threading import Thread,Lock
import time
mutexA = Lock()
mutexB = Lock()

class MyThread(Thread) :
    def run(self) :
        self.func1()
        self.func2()

    def func1(self) :
        mutexA.acquire()
        print(f'{self.name}Grab lock A ')
        # self.name Gets the name of the current thread
        mutexB.acquire()
        print(f'{self.name}Grab lock B ')
        mutexB.release()
        mutexA.release()
        
    def func2(self) :
        mutexB.acquire()
        print(f'{self.name}Grab lock B ')
        time.sleep(2)
        mutexA.acquire()
        print(f'{self.name}Grab lock A ')
        mutexA.release()
        mutexB.release()
        
if __name__ == '__main__':
    for i in range(10):
        t = MyThread()
        t.start()
        
Run result
Thread-1Grab lock A Thread-1Grab lock B Thread-1Grab lock B Thread-2Grab A lock.... It's blocked. It's deadlockedCopy the code

The above execution results can be analyzed:

1. A total of 10 threads are started and run() will be automatically executed. 2. Func1 function 3 will be performed first. When func1 is executed, one of the 10 threads will grab lock A first, and the other 9 threads will wait for lock A to be released before they can grab lock A 4. The thread that holds the lock A will hold the lock B smoothly and release the lock B and lock B 5 in turn. Once the lock A is released, the first thread can execute func2 and hold the lock B again. Meanwhile, when the other nine threads execute func1, another thread will hold the lock A 6. The second thread holds the LOCK A and tries to grab the lock B, while the func2 thread holds the lock B and tries to grab the lock A 7. This creates a deadlock phenomenon

To solve the problem of deadlocks, we can use recursive locks. In Python, to support multiple requests for the same resource in the same thread, Python provides a reentrant lock RLock. This RLock internally maintains a Lock and a counter variable, which records the number of acquire operations so that resources can be required multiple times. No other acquire thread can acquire resources until all acquire threads are released. In the above example, deadlock does not occur if RLock is used instead of Lock:

MutexA = Lock() mutexB = Lock()# for
    mutexA = mutexB = RLock()
Copy the code

GIL

Python interpreters have multiple versions, such as Cpython/Jpython/Pypython. The most commonly used version is Cpython. The GIL(global interpreter lock) is a mutex lock in the Cpython interpreter.

GIL exists because Cpython’s memory management is not thread-safe. Execution code need to be explained to the interpreter, because all the threads are in the process of sharing resources within the process, it is the competition, and garbage collection mechanism is also a threads in the current process, the thread and other threads competing for the current process data, to ensure data security, the process only one thread running in the same time have the GIL.

Python has a GIL to ensure that only one thread is running at a time. First need to know in order to protect the data, the purpose of the Lock is at the same time there can only be one thread to modify a Shared data, and the protection of different data should add different locks, GIL with two locks, Lock is to protect the data, the former is the interpreter level (of course is the interpreter levels of data protection, garbage collection of data, for example). The latter is to protect the data of the application developed by the user. Obviously the GIL is not responsible for this, only the user can define the locking process, i.e. Lock.

Locks are commonly used to enable synchronous access to shared resources. Create a Lock object for each shared resource. When you need to access the resource, call acquire to acquire the Lock object (if another thread has already acquired the Lock, the current thread will have to wait for it to be released). When the resource is accessed, call Release to release the Lock.

If I use join immediately after start, the result will be serial. Why do I need to use lock? When jion is used immediately after start, it is safe to serialize the execution of multiple tasks. However, the problem is that join immediately after start: all the code in the task is serialized, while lock, only the part of lock that modifs the shared data, is serialized. In terms of data security alone, you can do both, but locking is obviously more efficient. For example, the following code verifies the efficiency of the JOIN method, and you can see that the final execution time is very long.

from threading import current_thread,Thread,Lock
import os,time
def task() :
    time.sleep(3)
    print('%s start to run' %current_thread().getName())
    global n
    temp=n
    time.sleep(0.5)
    n=temp-1


if __name__ == '__main__':
    n=100
    lock=Lock()
    start_time=time.time()
    for i in range(100):
        t=Thread(target=task)
        t.start()
        t.join()
    stop_time=time.time()
    print(N 'main: % s: % s' %(stop_time-start_time,n))

''' Thread-1 start to run Thread-2 start to run ...... Thread-100 start to run 主:350.6937336921692 N :0

Copy the code

Python multithreading applications

We know that Cpython can’t take advantage of multicore computing due to GIL, does that mean Cpython is useless? To answer this question, it is necessary to determine whether the CPU is used for computing or for PERFORMING I/O operations. Multiple cpus mean that multiple cores can perform computation in parallel. Therefore, multiple cores can improve computing speed, but each CPU still needs to wait for I/O blocking.

For example, a worker is equivalent to a CPU, computing is equivalent to a worker doing work, AND I/O blocking is equivalent to the process of providing the raw materials needed for a worker to do work. If there are no raw materials in a worker’s working process, the worker’s working process needs to stop until the raw materials arrive. If your factory do most of the tasks should have the preparation of raw materials (I/O intensive), then you have no amount of workers, the meaning is not big, it is better to a person, in the process of the material such as allowing workers to do other work, conversely, if your factory in a full range of raw materials, that is, of course, the more workers, the higher the efficiency.

So for computationally intensive programs, more CPU is better, but for IO intensive programs, no amount of CPU is necessary.

conclusion

The article was first published on the wechat public account Program Yuan Xiaozhuang, and synchronized with nuggets and Zhihu.

The code word is not easy, reprint please explain the source, pass by the little friends of the lovely little finger point like and then go (╹▽╹)