Coroutines thread and the process is not open around in daily development concurrent programming content, but also want to learn how this piece of knowledge, must also be on the underlying computer knowledge has certain master, including the principle, operating system, network, etc., so I wrote a column thread process coroutines, plans to knowledge made a comb, is the summary of the self, I also hope to help readers.
This fifth installment focuses on Python coroutines, and the code for this installment is located in the github directory.
What is a coroutine
Coroutines, also known as microthreads, fibers. English name Coroutine.
Subroutines, or functions, are hierarchical calls in all languages, such as A calls B, B calls C during execution, C returns when it finishes, B returns when it finishes, and finally A completes.
So subroutine calls are implemented through the stack, and a thread is executing a subroutine.
Subroutine calls are always one entry, one return, and the order of calls is clear. Coroutine calls are different from subroutines.
Coroutines also appear to be subroutines, but during execution, can be interrupted within the subroutine, and then turn to execute another subroutine, and then return at the appropriate time to continue execution.
Coroutines VS multithreading
The biggest advantage is that coroutines are executed efficiently. Because the subroutine switch is not a thread switch, but is controlled by the program itself, there is no thread switching overhead, and the more threads there are, the greater the performance advantage of coroutines compared to multithreading.
The second advantage is that there is no need for multi-threaded locking mechanism, because there is only one thread, there is no concurrent write variable conflict, in the coroutine control shared resources are not locked, only need to determine the state of the good, so the execution efficiency is much higher than multi-threaded.
Since coroutines are executed in one thread, how do you leverage a multi-core CPU? The simplest method is multi-process + coroutine, which not only makes full use of multi-core, but also gives full play to the high efficiency of coroutine, and can obtain extremely high performance.
Generator based coroutines
An example based on generator coroutines is as follows:
def consumer() :
r = ' '
while True:
n = yield r
if not n:
return
print('[CONSUMER] Consuming %s... ' % n)
r = '200 OK'
def produce(c) :
c.send(None)
n = 0
while n < 5:
n = n + 1
print('[PRODUCER] Producing %s... ' % n)
r = c.send(n)
print('[PRODUCER] Consumer return: %s' % r)
c.close()
c = consumer()
produce(c)
Copy the code
The output is:
[PRODUCER] Producing 1...
[CONSUMER] Consuming 1...
[PRODUCER] Consumer return: 200 OK
[PRODUCER] Producing 2...
[CONSUMER] Consuming 2...
[PRODUCER] Consumer return: 200 OK
[PRODUCER] Producing 3...
[CONSUMER] Consuming 3...
[PRODUCER] Consumer return: 200 OK
[PRODUCER] Producing 4...
[CONSUMER] Consuming 4...
[PRODUCER] Consumer return: 200 OK
[PRODUCER] Producing 5...
[CONSUMER] Consuming 5...
[PRODUCER] Consumer return: 200 OK
Copy the code
Notice that the Consumer function is a generator. Pass the Consumer function in to produce:
First call c.end (None) to start the generator;
Then, once something is produced, switch to consumer execution via C.end (n);
Consumers use yield to get messages, process them, and yield results back.
Produce takes the result of the consumer’s processing and continues to produce the next message.
Produce decides not to produce, closes the consumer by c.close(), and the whole process is over.
The entire process is unlocked and executed by a single thread, and the produce and consumer work together, so it’s called a “coroutine” rather than threaded preemptive multitasking.
Coroutines based on the Asyncio library
Using generator coroutines is an old way of implementing coroutines from the days beginning with Python 2. Python 3.7 provides new asyncio and async/await based methods.
Ordinary crawler
import time
def crawl_page(url) :
print('crawling {}'.format(url))
sleep_time = int(url.split('_')[-1])
time.sleep(sleep_time)
print('OK {}'.format(url))
def main(urls) :
for url in urls:
crawl_page(url)
begin_time = time.time()
main(['url_1'.'url_2'.'url_3'.'url_4'])
end_time = time.time()
run_time = end_time - begin_time
print("Program takes {}s".format(run_time))
Copy the code
The above program is a simulation of a simple crawler program, the execution time of each URL is the number after them, because it is a loop execution, a total of 4 URLS took 10 seconds to crawl.
Coroutines transformation
Let’s start with Import Asyncio, which contains most of the magic tools we need to implement coroutines.
The async modifier declares an asynchronous function, so crawl_page and main both become asynchronous functions. By calling an asynchronous function, we get a Coroutine object.
Once you have a coroutine object, you can create tasks with asyncio.create_task. The task is scheduled for execution soon after it is created, so our code does not block on the task. So, we need to wait for all tasks to finish. Instead, we use for task in tasks: await task.
For tasks, you can also use the Gather function to achieve the same effect.
import time
import asyncio
async def crawl_page(url) :
print('crawling {}'.format(url))
sleep_time = int(url.split('_')[-1])
await asyncio.sleep(sleep_time)
print('OK {}'.format(url))
async def main(urls) :
tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
for task in tasks:
await task
# await asyncio.gather(*tasks)
begin_time = time.time()
asyncio.run(main(['url_1'.'url_2'.'url_3'.'url_4']))
end_time = time.time()
run_time = end_time - begin_time
print("Program takes {}s".format(run_time))
Copy the code
After this program is executed, the longest url execution time is 4s.
explore
import time
import asyncio
async def worker_1() :
print('worker_1 start')
await asyncio.sleep(1)
print('worker_1 done')
async def worker_2() :
print('worker_2 start')
await asyncio.sleep(2)
print('worker_2 done')
async def main() :
task1 = asyncio.create_task(worker_1())
task2 = asyncio.create_task(worker_2())
print('before await')
await task1
print('awaited worker_1')
await task2
print('awaited worker_2')
begin_time = time.time()
asyncio.run(main())
end_time = time.time()
run_time = end_time - begin_time
print("Program takes {}s".format(run_time))
Copy the code
The output of the above program is:
before await
worker_1 start
worker_2 start
worker_1 done
awaited worker_1
worker_2 done
awaited worker_2
Copy the code
The execution process is as follows:
- Asyncio.run (main()), the program enters the main() function, the event loop is opened;
- Task1 and task2 tasks are created and enter the event loop waiting to run.
- Run it to print and output ‘before await’;
- Await task1 executes, the user selects to cut off from the current main task, and the event scheduler starts scheduling worker_1;
- Worker_1 starts running, run print output ‘worker_1 start’, then run to await asyncio.sleep(1), cut off from the current task, and the event scheduler starts scheduling worker_2;
- Worker_2 starts running, run print output ‘worker_2 start’, then run await asyncio.sleep(2) to cut off from the current task;
- All of the above events should run between 1ms and 10ms, or maybe even less, when the event scheduler suspends the schedule;
- After one second, worker_1’s sleep is complete, and the event scheduler resends control to task_1 and outputs ‘worker_1 done’. Task_1 completes the task and exits the event loop.
- Await task1 completes, the event scheduler passes the controller to the main task with the output ‘worker_1’, and then continues to wait at await task2;
- After two seconds, worker_2’s sleep is complete, and the event scheduler passes control back to task_2 with the output ‘worker_2 done’. Task_2 completes the task and exits the event loop.
- The main task outputs’ Worker_2 ‘, the whole coroutine task ends, and the event loop ends.
Readers can adjust the parameters to simulate the process of coroutine.
Experience sharing
The developer needs to know in advance which part of a task will cause I/O blocking, then asynchronize the code in that part of the task and identify with await to interrupt the task execution in that part of the task to execute the next event loop task. In this way, the CPU resources are fully utilized to avoid wasting CPU resources while waiting for I/ OS. When the I/O for that part of the previous task is complete, the thread can get the return value from the await and continue executing the remainder of the unfinished code.
The important thing in coroutines is the understanding of the keyword await. Async means that it decorates a coroutine task, which is a task. Await means that when the thread reaches this sentence, the task is suspended there, and then the scheduler goes on to execute another task. The callback function is called to tell the scheduler that I have finished executing, and the scheduler returns to process the rest of the task.
Multithreading and asyncio
Limitations of multithreading
Multithreading has many advantages and is widely used, but it also has some limitations:
For example, a multithreaded process can be interrupted easily, so race condition is possible. For another example, thread switching itself has a cost, and the number of threads cannot be increased indefinitely, so if your I/O operations are heavy, it is likely that multiple threads will not be able to meet the needs for high efficiency and quality.
Race condition refers to resource competition. During the execution of multiple threads, the same resource or data may be contested by multiple threads. As a result, the final result may be inconsistent with the expected result.
How Asyncio works
Asyncio, like other Python programs, is single-threaded. It has a single main thread, but can perform multiple different tasks. In this case, the task is a particular Future object. These different tasks are controlled by an object called an Event Loop. You can think of the task here as analogous to multiple threads in the multithreaded version.
To simplify this problem, we can assume that the task has only two states: one is the ready state; The second is the waiting state. The so-called ready state is that the task is currently idle, but ready to run at any time. In a wait state, a task is running but waiting for an external operation, such as an I/O operation, to complete.
In this case, the Event Loop maintains two task lists, one for each state; In addition, a task in the preparatory state is selected (the specific task is selected, which is related to its waiting time, occupied resources, etc.), and the task is run until the control of the task is returned to the Event Loop.
When a task returns control to the Event Loop, the Event Loop puts the task into a list of ready or waiting states based on whether it has completed, and then iterates through the waiting state list to see if they have completed.
- If done, put it on the list of ready states;
- If not, it continues on the list in the wait state.
The positions of the tasks that were on the ready list remain the same because they are not running yet. Then, when all the tasks have been replaced in the appropriate list, the loop begins again: The Event loop continues to select a task from the ready list for execution… And so on until all the tasks are completed.
It is also worth mentioning that Asyncio tasks are not interrupted by external factors at runtime, so there is no race condition for operations inside the Asyncio, so you don’t need to worry about thread safety.
The defect of asyncio
In practice, to use Asyncio well, especially to make full use of its powerful functions, in many cases, it is necessary to have a corresponding Python library to support the compatibility of Asyncio software libraries. This was a big problem in the early days of Python3, but with the development of the technology, this problem is gradually being solved.
In addition, with Asyncio, because you have more autonomy in the scheduling of tasks, you have to be more careful when writing code, otherwise it is very error-prone.
Multithreaded or asyncio?
if io_bound:
if io_slow:
print('Use Asyncio')
else:
print('Use multi-threading')
else if cpu_bound:
print('Use multi-processing')
Copy the code
The above principles can be followed:
- If it is I/O bound and I/O operations are slow and require many tasks/threads to implement, then Asyncio is more appropriate
- If it’s I/O bound, but I/O operations are fast and only require a limited number of tasks/threads, then multithreading is fine.
- In the case of CPU bound, multiple processes are required to make the program run more efficiently.
Multithreaded versus asyncio coroutines
The same point is that they are concurrent operations, multi-threaded at the same time can only have one thread in the execution, coroutine at the same time can only have one task in the execution;
I made a table of the differences between Python multithreading and coroutines
multithreading | asyncio | |
---|---|---|
The number of threads | multithreading | Single thread |
scheduling | The operating system | User control, more autonomous control |
Race condition | Easy to interrupt, can cause thread safety issues | No interruptions, no worries about thread safety |
Switching costs | thread | Task switching loss is much less than thread switching loss |
Number of tasks that can be enabled | less | Much more than the number of threads |
Execution efficiency | If the IO operation is fast and not heavy, then using multiple threads can solve the problem very efficiently | Heavy IO operations are more efficient than multi-threaded operations |
Third-party library support | perfect | imperfect |
The resources
www.liaoxuefeng.com/wiki/101695…
Time.geekbang.org/column/arti…