Multiprocessing module

The multiprocessing package is a multiprocess management package in Python. It is similar to threading.Thread in that you can create a Process using the Multiprocessing. Process object. This process can be placed in functions written inside Python programs. The Process object has the same usage as the Thread object, with is_alive(), Join ([timeout]), run(), start(), Terminate () and other methods. The attributes are: AuthKey, Daemon (to be set by start()), exitCode (the process is None at run time, if -n, it is terminated by signal N), Name, PID. In addition multiprocessing package also have the Lock/Event/Semaphore/Condition, used for synchronization process, its usage and threading also in the same class. A large portion of Multiprocessing uses the same API as threading, but in a multi-process context.

This module represents managing processes like threads, which is at the heart of Multiprocessing. It is similar to threading, and will use a much better multi-core CPU than threading.

Take a look at the constructor of the Process class:

__init__(self, group=None, target=None, name=None, args=(), kwargs={})Copy the code

Parameter description: group: indicates the group to which a process belongs. Rarely used target: refers to the call object. Args: represents the positional parameter tuple of the calling object. Name: alias kwargs: dictionary that represents the invoked object.

Create a simple instance of the process:

#coding=utf-8 import multiprocessing def do(n) : Name = multiprocessing.current_process(). Name print name,'starting' print "worker ", n return if __name__ == '__main__' : numList = [] for i in xrange(5) : p = multiprocessing.Process(target=do, args=(i,)) numList.append(p) p.start() p.join() print "Process end."Copy the code

Execution Result:

Process-1 starting
worker  0
Process end.
Process-2 starting
worker  1
Process end.
Process-3 starting
worker  2
Process end.
Process-4 starting
worker  3
Process end.
Process-5 starting
worker  4
Process end.
Copy the code

Creating a child Process is easier than forking () by passing in a function that executes and its parameters, creating a Process instance and starting it with its start() method. The join() method, which waits for the child process to finish before continuing, is typically used for synchronization between processes.

Note: To use the process module on Windows, you must write the process code under the if __name__ == ‘__main__’ : statement in the current.py file to use the Windows process module properly. This is not required under Unix/Linux.

The Pool class

When using Python for system administration, especially when operating multiple files and directories at the same time or remotely controlling multiple hosts, parallel operations can save a lot of time. If the number of objects is small, you can use the Process class to dynamically generate multiple processes. A dozen is fine, but if there are hundreds or more, manually limiting the number of processes is cumbersome, and Process pooling is useful. The Pool class provides a specified number of processes for the user to invoke, and when a new request is submitted to the Pool, if the Pool is not full, a new process is created to execute the request. If the pool is full, the requests are told to wait until any of the processes in the pool have finished before a new process is created to execute the requests. Here are some methods under the Pool class of the MultiProcessing module

apply()

Function prototype:

apply(func[, args=()[, kwds={}]])Copy the code

This function is used to pass variable arguments, and the main process is blocked until the end of the function (not recommended, and 3.x does not appear later).

apply_async()

Function prototype:

apply_async(func[, args=()[, kwds={}[, callback=None]]])Copy the code

As with the Apply usage, but it is non-blocking and supports a result return for callback.

map()

Function prototype:

map(func, iterable[, chunksize=None])Copy the code

The map method in the Pool class, which behaves roughly the same as the built-in map function, blocks the process until the result is returned. Note that although the second argument is an iterator, in practice the program will not run the child process until the entire queue is ready.

close()

Close the process pool so that it does not accept new tasks.

terminate()

End the worker process, no longer processing the outstanding task.

join()

The main process blocks and waits for the child process to exit, and the join method must be used after close or terminate.

Examples of the multiprocessing.Pool class:

import time from multiprocessing import Pool def run(fn): #fn: Sleep (1) return fn*fn if __name__ == "__main__": TestFL = [1,2,3,4,5,6] print 'shunxu:' Run (fn) e1 = time.time() print" ", int(e1-s) print 'concurrent:' # create a pool with 5 processes #testFL: list of data to be processed, run: Rl =pool.map(run, testFL) pool.close() E2 = time.time() print "parallel execution time: ", int(e2-e1) print rlCopy the code

Execution Result:

Shunxu: sequential execution time: 6 concurrent: concurrent execution time: 2 [1, 4, 9, 16, 25, 36]Copy the code

The above example is the difference between the time it takes for a process to process the same data concurrently and sequentially. As can be seen from the results, the time of concurrent execution is significantly faster than that of sequential execution. However, processes consume resources, so the number of processes should not be too large in normal work. R1 said all the process execution in the program after the global result set, run function returns a value, so a process corresponding to an return a result, the result is a list, which is a result of, in fact is to use the principle of the queue, waiting for all process is performed, it returns the list (the order of the list). Calling join() on a Pool object will wait for all child processes to complete, and close() must be called before calling join() to stop it accepting new processes.

Here’s another example:

import time from multiprocessing import Pool def run(fn) : time.sleep(2) print fn if __name__ == "__main__" : Pool.map (run,testFL) pool.close() pool.join() pool.map(run,testFL) pool.close() pool.join() endTime = time.time() print "time :", endTime - startTimeCopy the code

Execution Result:

21 3 4 5 Time: 2.51999998093Copy the code

The result is as follows:

1 34 2 5 time: 2.48600006104Copy the code

Why are there blank lines and unfolded data in the result? In fact, this is related to process scheduling. When there are multiple processes executing in parallel, each process gets different time slices, which process accepts which request and the execution completion time are uncertain, so the output will be out of order. So why is there a row or a blank row? Because it is possible to switch to another process just as the first process is about to print a newline, it is highly likely that both numbers will be printed on the same line, and that switching back to the first process will print a newline, so there will be empty lines.

Practical example of process

Parallel processing characters number and the number of rows in a directory file, deposited in the res. TXT file, one line per file, format for: filename: lineNumber, charNumber

import os import time from multiprocessing import Pool def getFile(path) : List fileList = [] for root, dirs, files in list(os.walk(path)) : for I in files: fileList = [] for root, dirs, files in list(os.walk(path)) : if i.endswith('.txt') or i.endswith('.10w') : fileList.append(root + "\\" + i) return fileList def operFile(filePath) : Count the number of lines and characters per file And return filePath = filePath fp = open(filePath) content = fp.readlines() fp.close() lines = len(content) alphaNum = 0 for I in content : alphaNum += len(i.strip('\n')) return lines,alphaNum,filePath def out(list1, writeFilePath) : CharNum = 0 fp = open(writeFilePath,'a') for I in list1: Fp. write(I [2] + "line number: "+ STR (I [0]) +" "+str(i[1]) + "\n") fileLines += i[0] charNum += i[1] fp.close() print fileLines, charNum if __name__ == "__main__": StartTime = time.time() filePath = "C:\\ WCX \\a" fileList = getFile(filePath) pool = pool (5) resultList =pool.map(operFile, fileList) pool.close() pool.join() writeFilePath = "c:\\wcx\\res.txt" print resultList out(resultList, writeFilePath) endTime = time.time() print "used time is ", endTime - startTimeCopy the code

Execution Result:



It takes less than 1 second, which shows that the concurrent execution of multiple processes is very fast.