Source: official account [Jie Ge’s IT Journey]

Author: Alaska

ID: Jake_Internet

Article link: Implementing multitasking with Python

First, process introduction

Process: the executing program, composed of program, data and process control block, is the execution of the program, a program execution process, is the basic unit of resource scheduling.

Program: Code that has no execution and is static.

Comparison between threads and processes

It can be seen from the figure that there are 9 application processes in the computer at this time, but one process corresponds to multiple threads, so it can be concluded that \

Process: can complete multi-task, a computer can run multiple QQ at the same time

Thread: can complete multi-task, a QQ in multiple chat Windows

Fundamental difference: Processes are the basic unit of operating system resource allocation, while threads are the basic unit of task scheduling and execution.

Advantages of using multiple processes:

1. Have an independent GIL:

First, multi-threading in Python does not take advantage of multi-core because of the GIL in the process. Multiple threads in a process can only have one thread running at a time. For multi-process, each process has its own GIL. Therefore, under multi-core processor, the operation of multi-process will not be affected by GIL. Therefore, multi – path can better play the advantages of multi – core.

2, high efficiency

Of course, there is no significant difference between multi-threading and multi-process for the IO intensive task of crawler. For computationally intensive tasks, Python’s multi-processes are exponentially more efficient than multi-lines.

Python implements multiple processes

Let’s get a feel for it with an example:

3.1 Using the Process class

import multiprocessing 
def process(index): 
    print(f'Process: {index}') 
if __name__ == '__main__': 
    for i in range(5): 
        p = multiprocessing.Process(target=process, args=(i,)) 
        p.start()
Copy the code

This is the most basic way to implement multiple processes: Create a new child Process by creating Process, in which the target argument is passed in the method name and args is passed in as a tuple that corresponds to the arguments of the called method Process.

Note: Args must be a tuple. If there is only one parameter, add a comma after the first element of the tuple. If there is no comma, it is the same as the single element itself and cannot form a tuple. After creating the process, we can start it by calling the start method.

The running results are as follows:

Process: 0 
Process: 1 
Process: 2
Process: 3 
Process: 4
Copy the code

As you can see, we ran five child processes, each of which called the process method. The index parameter of the process method is passed in through the args of the process. The index parameter is 0 to 4, and is printed out at last. The five sub-processes finish running.

3.2 Inherit the Process class

from multiprocessing import Process import time class MyProcess(Process): def __init__(self,loop): Process.__init__(self) self.loop = loop def run(self): for count in range(self.loop): Time. Sleep (1) print(f'Pid:{self. Pid} LoopCount: {count}') if __name__ == '__main__': for I in range(2,5): p = MyProcess(i) p.start()Copy the code

We start by declaring a constructor that takes a loop parameter, representing the number of cycles, and sets it as a global variable. In the run method, the loop variable is looped one more time and the current process number and loop number are printed.

When called, we use the range method to get the numbers 2, 3, and 4 and initialize the MyProcess respectively. Then we call the start method to start the process.

Note: The execution logic of the process needs to be implemented in the run method. To start the process, the start method needs to be called, after which the run method is executed.

The running results are as follows:

Pid:12976 LoopCount: 0
Pid:15012 LoopCount: 0
Pid:11976 LoopCount: 0
Pid:12976 LoopCount: 1
Pid:15012 LoopCount: 1
Pid:11976 LoopCount: 1
Pid:15012 LoopCount: 2
Pid:11976 LoopCount: 2
Pid:11976 LoopCount: 3
Copy the code

Note that pid represents the process ID. The running result may be different on different machines and at different times.

Communication between processes

4.1 Queue- Queue first in first out

From multiprocessing import Queue import multiprocessing def download(p): LST = [11,22,33,44] P. put(item) print(' Data downloaded successfully.... ') def savedata(p): lst = [] while True: data = p.get() lst.append(data) if p.empty(): break print(lst) def main(): p1 = Queue() t1 = multiprocessing.Process(target=download,args=(p1,)) t2 = multiprocessing.Process(target=savedata,args=(p1,)) t1.start() t2.start() if __name__ == '__main__': Main () data has been downloaded successfully.... [44] 11, 22, 33,Copy the code

4.2 Sharing global variables is not suitable for multi-process programming

import multiprocessing

a = 1


def demo1():
    global a
    a += 1


def demo2():
    print(a)

def main():
    t1 = multiprocessing.Process(target=demo1)
    t2 = multiprocessing.Process(target=demo2)

    t1.start()
    t2.start()

if __name__ == '__main__':
    main()
Copy the code

Running results:

1
Copy the code

Global variables are not shared;

Communication between process pools

5.1 Importing process Pools

When the number of child processes to be created is small, you can use the Process in Multiprocessing to dynamically generate multiple processes. However, if there are hundreds or even thousands of targets, manually creating a Process is a huge amount of work. You can use the Pool method provided by the Multiprocessing module.

from multiprocessing import Pool import os,time,random def worker(a): T_start = time.time() print('%s') time.sleep(random.random()*2) t_stop = time.time() Print (a, "completes, time-consuming % 0.2 f" % (t_stop - t_start)) if __name__ = = "__main__ ': Po = Pool Pool (3) # define a process for I in range (0, 10) : Po. Apply_async (worker, (I)) # to process the task of adding the worker pool print (" -- the start - ") Po. The close () Po. The join () print (" -- the end -- ")Copy the code

Running results:

--start-- 0 Start and the process ID is 6664 1 Start and the process ID is 4772 2 Start and the process ID is 13256 0 Finish and the time is 0.18 3 Start and the process ID is 6664 2 Finish and the time is 0.16 4 Start and the process ID is 13256 1 4 Complete, 0.87 time, 6 Start, 13256 3 Complete, 1.59 time, 7 Start, 6664 time, 1.15 8 Start, 4772 7 Execution completed, time 0.40 9 Start, process ID 6664 6 Finish, time 1.80 8 Finish, time 1.49 9 Finish, time 1.36 --end-Copy the code

A process pool can hold only three processes, and new tasks can be added only after they are completed, in a cycle of opening and releasing.

Vi. Case: Batch copy of files

Operation idea:

  • Gets the name of the folder to copy
  • Create a new folder
  • Gets all file names in the folder to be copied
  • Creating a Process Pool
  • Add a task to the process pool

The code is as follows:

Guide package

import multiprocessing
import os
import time
Copy the code

Custom file copy functions

def copy_file(Q,oldfolderName,newfolderName,file_name): # file copying, do not need to return time. Sleep (0.5) # print (' \ r from % s folder to copy the folder % s to % s file '% (oldfolderName newfolderName, file_name), end =' ') old_file = Open (oldfolderName + '/' + file_name,'rb') # Content = old_file.read() old_file.close() new_file = Open (newfolderName + '/' + file_name,'wb') # new_file.write(content) new_file.close() q.put (file_name) # Add files to the Q queueCopy the code

Defining the main function

def main(): OldfolderName = input(' Please enter the name of the folder you want to copy :') If not os.path.exists(newfolderName): os.mkdir(newfolderName) filenames = os.listdir(oldfolderName) # 3 # print(filenames) pool = multiprocessing. pool (5) # 4 Q = multiprocessing.manager ().queue () pool.apply_async(copy_file,args=(Q,oldfolderName,newfolderName,file_name)) # 5. Add task po.close() copy_file_num = 0 file_count = len(filenames) # do not know when to complete, so define an infinite loop while True: File_name = q.get () copy_file_num += 1 time.sleep(0.2) print('\r copy progress %.2f %%'%(copy_file_num * 100/file_count),end='') # Create a copy progress bar if copy_FILe_num >= file_count: breakCopy the code

The program runs

if __name__ == '__main__':
    main()
Copy the code

The running results are as follows:

Comparison of file directory structure before and after running

Before running

After the operation

The above content is the overall general results, because the test is a random paste of the test file, here will not expand the demonstration.

In this paper, to the end.


Original is not easy, if you think this article is useful to you, please kindly like, comment or forward this article, because this will be my power to output more high-quality articles, thank you!

By the way, please give me some free attention! In case you get lost and don’t find me next time.

See you next time!