The multiprocessing module is used to implement a simple multithreaded crawler with several lines of code:
import requests
from multiprocessing.dummy import Pool
def get(url):
print(requests.get(url).text, '\n')
url_list = [
'http://exercise.kingname.info/exercise_middleware_ip/1'.'http://exercise.kingname.info/exercise_middleware_ip/2'.'http://exercise.kingname.info/exercise_middleware_ip/3'.'http://exercise.kingname.info/exercise_middleware_ip/4'
]
pool = Pool(3)
result = pool.map(get, url_list)
Copy the code
The operating effect is shown in the figure below:
(For those of you who haven’t read my book, isn’t Multiprocessing a multi-process module? Why do you say multithreading? Those of you who have read the book won’t be confused, because I explain why.)
Now, you have a function that takes no arguments, but you still want it to use multiple threads, so copy the code above. You write it like this:
import requests
from multiprocessing.dummy import Pool
def test(a):
print('Function runs successfully! ')
pool = Pool(3)
result = pool.map(test, ())
Copy the code
When I run it, I find that nothing is printed, that is, the test() function is not run at all.
If you force a useless argument into a function, the result is normal again:
import requests
from multiprocessing.dummy import Pool
def test(_):
print('Function runs successfully! \n')
pool = Pool(3)
result = pool.map(test, (0.) *3)
Copy the code
The running effect is shown in the figure below.
So you have a hunch that if the second argument to pool.map is an empty iterable, the function won’t run.
(Of course, those of you who have used Python’s built-in map function know this directly, but this article uses it as an example to illustrate how to read the source code.)
In order to prove the point, we open the Python installation directory/lib/multiprocessing/pool. Py files, find def inside the map (self, func, iterable, chunksize = None) this line, as shown in the figure below:
(This article uses Python 3.7.3 as a demonstration. If your Python version is not 3.7.3, the code may differ a little.)
As you can see from the code, self._map_async() is called, the argument is passed in, the return value is obtained, and the.get() method of the return value is called.
So continue with the self._map_async() method:
In this method, if the iterable we pass is empty, then the iterable argument is empty. so
chunksize = 0
len(iterable) = 0
Copy the code
The first argument to map, the function name, is passed in the following line:
task_batches = Pool._get_tasks(func, iterable, chunksize)
Copy the code
Looking at the pool. _get_tasks static method, you can see:
Since the argument it is the empty iterable with size 0, the following line returns the empty tuple:
tuple(itertools.islice(it, size))
Copy the code
The generator will terminate immediately, and the last line yield (func, x) will not execute at all.
This code initializes a Result object using the MapResult class and returns it.
Enter the MapResult class, as shown below:
In __init__, we get the following values:
self._success = True
self._value = [] # because [None] * 0 results in []
self._event.set()
Copy the code
Self._event.set () self._event.set()
One skill in a day: Event monitoring for Python multithreading
The.get() method of the returned result object is called. But because MapResult itself doesn’t have a.get() method, it instead calls the.get() method of its parent ApplyResult class.
Go to ApplyResult and look at the.get() method:
Self.ready () is True because self._event.set() was called earlier, and return self._value because self._success is True above. That is, return an empty list.
At this point, when the second parameter of pool.map is an empty iterable, all the flow is done. There is no func call involved. So the original function will not be executed.
Dummy dummy Dummy Dummy Dummy Dummy Dummy Dummy Dummy Dummy
This is because, if we open the Python installation path/Lib/multiprocessing/dummy/set p y, we can see that it is a ThreadPool object returned by the Pool in fact. And the object code, actually also in Python installation path/Lib/multiprocessing/pool. Py files, and inherited from the pool. So the code for their map methods is exactly the same.