Concurrent programming belongs to rigid needs, multithreading, coroutine, multi-process three methods can realize concurrent or parallel programming.
Multithreading, coroutines are concurrent operations, multi-processes are parallel operations, so do you understand what is concurrent and what is parallel?
The difference between concurrency and parallelism
Answer borrowed from Zhihu users:
You’re in the middle of a meal when the phone comes and you don’t answer it until after you’ve finished, which means you don’t support concurrency or parallelism.
You’re in the middle of a meal when the phone call comes, you stop to answer the phone, and then continue eating, which means you support concurrency. You’re in the middle of a meal when the phone call comes and you’re eating while you’re talking, which means you support parallelism. The key to concurrency is that you have the ability to handle multiple tasks, not necessarily all at once. The key to parallelism is your ability to handle multiple tasks at once.
Multithreading: In Python, due to the global lock (GIL), concurrency is when multiple threads take turns using the CPU. Only one thread is working at a time, and the operating system switches at the appropriate time. Threads switch so fast that it gives the impression that multiple tasks are running. In the I/O intensive task scenario, I/O operations are still being performed after thread switchover. When thread 1 is performing I/O operations, thread 2 can obtain CPU resources for calculation, which increases the switchover cost but improves efficiency.
Coroutines: A coroutine is a lightweight thread, a single thread, that can perform concurrent tasks because it gives the programmer the power to switch at which points. Coroutines can handle tens of thousands of concurrent processes, but multithreading cannot, because switching costs are too high, and computer resources will be exhausted. You can search for C10K problems.
Multiprocess: parallelism, where multiple tasks are performed at the same time. If you want to use multiple cores, choose multiple processes.
There is only one Python coroutine standard library, asyncio, while there are two standard libraries supporting multithreading, Multiprocessing: concurrent. futures and Multiprocessing. This article shares the differences between the two. Let’s start with the basics.
Multiprocessing
Multiprocessing has a thread pool and a process pool.
The thread pool:
from multiprocessing.dummy import Pool as ThreadPool
with ThreadPool(processes=100) as executor:
executor.map(func, iterable)
Copy the code
Process pool:
from multiprocessing import Pool as ProcessPool
with ProcessPool(processes=10) as executor:
executor.map(func, iterable)
Copy the code
Concurrent.futures
The thread pool:
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=5) as executor:
executor.map(function, iterable)
Copy the code
Process pool:
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=5) as executor:
executor.map(function, iterable)
Copy the code
Do you feel that they are used in exactly the same way, so why do the authorities provide these two libraries?
The difference between
In fact, the essential difference is not big, and some just call the way slightly different.
There was Multiprocessing, and then there was concurrent.futures, which was created to make writing code easier, and which was cheaper to learn.
In terms of speed, there is no such thing as faster or slower. How much acceleration (if any) you get depends on the hardware, the operating system details, and especially how much interprocess communication is required for a particular task. Behind the scenes, all processes rely on the same OS primitives, and the high-level apis using these primitives are not the main speed factor. Let’s share the details on how to use both.
About concurrent. Futures
The concurrent.futures module is officially a higher-level interface, mainly because of its simpler concurrency and parallel code. This module provides the following objects and functions:
- Phase of the process object: concurrent. Futures. The Future
- Module functions: concurrent.futures. Wait
- Actuator objects: concurrent. Futures. {Executor, ThreadPoolExecutor, ProcessPoolExecutor}
For example, the Executor class in Futures, when we execute executor.submit(func), schedules the func() function inside and returns the created future instance for you to call later.
Here are some more commonly used functions. The done() method in Futures indicates whether the corresponding operation has completed — True indicates that it has, False indicates that it has not. Note, however, that done() is non-blocking and returns results immediately. Add_done_callback (fn), which means that when Futures is completed, the corresponding parameter function fn will be notified and the call will be executed.
There is also an important result() function in Futures, which returns the result or exception when the future completes. As_completed (fs), for the given future iterator fs, returns the completed iterator upon completion.
ThreadPoolExecutor = ThreadPoolExecutor
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/'.'http://www.cnn.com/'.'http://europe.wsj.com/'.'http://www.bbc.co.uk/'.'http://some-made-up-domain.com/']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout) :
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
Copy the code
Please note:
ProcessPoolExecutor is a subclass of Executor that uses process pools to perform calls asynchronously and uses multiprocessing to sidestep the Global Interpreter Lock but also means, Functions as processes can only process and return serializable objects, and the __main__ module must be imported by the process, which means ProcessPoolExecutor cannot work in an interactive interpreter.
About the multiprocessing
Multiprocessing is a package for generating processes with an API similar to the threading module. The Multiprocessing package provides both local and remote concurrency, using subprocesses instead of threads to avoid the effects of the Global Interpreter Lock. Thus, the Multiprocessing module allows programmers to take full advantage of multiple cores on a machine. Runs on Unix and Windows.
The Multiprocessing module also introduces apis not found in the threading module. A prime example is the Pool object, which provides a quick way to give functions the ability to parallelize a series of input values and divide the input data among different processes (data parallelism). The following example demonstrates a common practice of defining such functions in a module so that a child process can successfully import the module. This basic example of data parallelism uses Pool,
from multiprocessing import Pool
def f(x) :
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1.2.3]))
Copy the code
conclusion
Therefore, use concurrent.futures for simple concurrent applications, and multiprocessing for more complex diy applications. Beginners learn concurrent.futures directly.
Reference Documents:
Docs.python.org/zh-cn/3/lib…
Docs.python.org/zh-cn/3/lib…
(after)