Introduction to the
Threads are usually thought of as lightweight processes, so we also think of coroutines as lightweight threads called microthreads.
Usually we concurrent programming in Python are typically implemented using multi-threaded or multi-process, for computational tasks as a result of the existence of GIL we usually use multiple processes, and for the IO type task we can let the thread by thread scheduling in performing the duty of IO release GIL, so as to realize the concurrent of surface. In fact, we have another option for IO tasks is coroutines, which are “concurrent” running in a single thread. One of the major advantages of coroutines compared with multithreading is that it saves the switching overhead between multiple threads and achieves greater running efficiency.
Coroutine, also known as microthread, fiber, English name Coroutine. The purpose of the coroutine is to interrupt function B at any time while executing function A, and then interrupt function B to continue executing function A (optionally). But this process is not a function call, the whole process looks like multithreading, but the coroutine is executed by only one thread.
So what’s the advantage of coroutines?
- The execution is extremely efficient, because the subroutine switch (function) is not a thread switch, controlled by the program itself, without the overhead of switching threads. Therefore, compared with multithreading, the more threads, the more obvious coroutine performance advantage.
- There is no need for multi-threaded locking mechanism, because there is only one thread, there is no simultaneous write variable conflict, and there is no need to lock when controlling shared resources, so the execution is much more efficient.
Coroutines can deal with the efficiency of IO intensive programs, but CPU intensive is not its strength, if you want to give full play to CPU utilization can be combined with multi-process + coroutines.
Coroutines in Python have come a long way. It probably went through the following three stages:
- The original generator variant yield/send
- Introduce @asyncio.coroutine and yield from
- Introduce the async/await keyword
Python2.x has limited support for coroutines. Yield implements some but not all of them. The GEvent module has a better implementation. Python3.4 added the asyncio module. Python3.5 provided support for async/await syntax. Python3.6 made the asyncio module more complete and stable. Let’s elaborate on these.
Python2. X coroutines
Python2. x implements coroutines in the following ways:
- yield + send
- Gevent (see subsequent section)
Yield + send (implementing coroutines with generators)
Let’s look at the use of coroutines using a producer-consumer model. After a producer produces a message, it switches directly to the consumer for execution, and then switches back to the producer for production.
#-*- coding:utf8 -*-
def consumer() :
r = ' '
while True:
n = yield r
if not n:
return
print('[CONSUMER]Consuming %s... ' % n)
r = '200 OK'
def producer(c) :
# start generator
c.send(None)
n = 0
while n < 5:
n = n + 1
print('[PRODUCER]Producing %s... ' % n)
r = c.send(n)
print('[PRODUCER]Consumer return: %s' % r)
c.close()
if __name__ == '__main__':
c = consumer()
producer(c)
Copy the code
The difference between send(MSG) and next() is that send can pass parameters to the yield expression. In this case, the parameters passed are the values of the yield expression, whereas the yield parameters are the values returned to the caller. In other words, send can forcibly change the value of the last yield expression. For example, if we have a yield assignment, a = yield 5, the first iteration will return 5, and a hasn’t been assigned yet. On the second iteration, send(10) is used, forcing the yield 5 expression to be 10 instead of 5. Send (MSG) and next() both return values. Their return values are the yield expressions when the current iteration encounters yield, which is the yield argument in the current iteration. The first call to send must be send(None), otherwise an error is reported, and None is given because there is no yield expression to assign to. The above example runs with the following output:
[PRODUCER]Producing 1...
[CONSUMER]Consuming 1...
[PRODUCER]Consumer return: 200 OK
[PRODUCER]Producing 2...
[CONSUMER]Consuming 2...
[PRODUCER]Consumer return: 200 OK
[PRODUCER]Producing 3...
[CONSUMER]Consuming 3...
[PRODUCER]Consumer return: 200 OK
[PRODUCER]Producing 4...
[CONSUMER]Consuming 4...
[PRODUCER]Consumer return: 200 OK
[PRODUCER]Producing 5...
[CONSUMER]Consuming 5...
[PRODUCER]Consumer return: 200 OK
Copy the code
Python3. X coroutines
In addition to the way coroutines are implemented in Python2.x, Python3.x provides the following ways to implement coroutines:
- Asyncio + yield from (python3.4+)
- Asyncio + async/await (python3.5 +)
Python3.4 introduced the asyncio module, which supports coroutines well.
asyncio + yield from
Asyncio is a standard library introduced in Python3.4 with built-in support for asynchronous IO. Asyncio asynchronous operations that need to be yield from in a coroutine. Look at the following code (required for Python3.4 and later) :
#-*- coding:utf8 -*-
import asyncio
@asyncio.coroutine
def test(i) :
print('test_1', i)
r = yield from asyncio.sleep(1)
print('test_2', i)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
tasks = [test(i) for i in range(3)]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Copy the code
@asyncio.coroutine marks a generator as a Coroutine and then throws the coroutine into an EventLoop for execution. Test () prints test_1 first, and the yield from syntax makes it easy to call another generator. Since asyncio.sleep() is also a coroutine, the thread does not wait for asyncio.sleep(), but instead interrupts and executes the next message loop. When asyncio.sleep() returns, the thread can get the return value (None in this case) from yield and proceed to the next line. Think of asyncio.sleep(1) as an IO operation that takes 1 second, during which the main thread does not wait but instead executes any other executable coroutine in the EventLoop, thus enabling concurrent execution.
asyncio + async/await
To simplify and better identify asynchronous IO, the new async and await syntax introduced from Python3.5 makes coroutine code more concise and readable. Note that async and await are new syntax for Coroutine, and using the new syntax requires only two simple substitutions:
- Replace @asyncio.coroutine with async
- Replace yield from with await
Look at the following code (used in Python3.5 and above) :
#-*- coding:utf8 -*-
import asyncio
async def test(i) :
print('test_1', i)
await asyncio.sleep(1)
print('test_2', i)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
tasks = [test(i) for i in range(3)]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Copy the code
The result is the same as before. Compared to the previous section, we just change yield from to await, @asyncio.coroutine to async, and the rest remains the same.
Gevent
Gevent is a network library based on a Greenlet implementation that implements coroutines. The basic idea is that a Greenlet is considered a coroutine. When a Greenlet encounters an IO operation, such as accessing a network, it will automatically switch to another Greenlet, wait until the IO operation is complete, and then switch back to continue the execution at an appropriate time. Since IO operations are time-consuming and often leave the program in a wait state, having gEvent automatically switch coroutines for us ensures that greenlets are always running instead of waiting for IO operations.
Greenlet is a C extension module that encapsulates the LIbevent event loop API, allowing developers to write asynchronous IO code in a synchronous manner without changing their programming habits.
#-*- coding:utf8 -*-
import gevent
def test(n) :
for i in range(n):
print(gevent.getcurrent(), i)
if __name__ == '__main__':
g1 = gevent.spawn(test, 3)
g2 = gevent.spawn(test, 3)
g3 = gevent.spawn(test, 3)
g1.join()
g2.join()
g3.join()
Copy the code
Running results:
<Greenlet at 0x10a6eea60: test(3)> 0
<Greenlet at 0x10a6eea60: test(3)> 1
<Greenlet at 0x10a6eea60: test(3)> 2
<Greenlet at 0x10a6eed58: test(3)> 0
<Greenlet at 0x10a6eed58: test(3)> 1
<Greenlet at 0x10a6eed58: test(3)> 2
<Greenlet at 0x10a6eedf0: test(3)> 0
<Greenlet at 0x10a6eedf0: test(3)> 1
<Greenlet at 0x10a6eedf0: test(3) > 2Copy the code
You can see that the three greenlets run sequentially rather than interchangeably. To alternate greenlets, hand over control via gevent.sleep() :
def test(n) :
for i in range(n):
print(gevent.getcurrent(), i)
gevent.sleep(1)
Copy the code
Running results:
<Greenlet at 0x10382da60: test(3)> 0
<Greenlet at 0x10382dd58: test(3)> 0
<Greenlet at 0x10382ddf0: test(3)> 0
<Greenlet at 0x10382da60: test(3)> 1
<Greenlet at 0x10382dd58: test(3)> 1
<Greenlet at 0x10382ddf0: test(3)> 1
<Greenlet at 0x10382da60: test(3)> 2
<Greenlet at 0x10382dd58: test(3)> 2
<Greenlet at 0x10382ddf0: test(3) > 2Copy the code
Of course, in real code, gevent.sleep() is not used to switch coroutines. Instead, gEvent is automatically completed when IO is executed, so gevent needs to change the execution mode of some of Python’s standard libraries from blocking to cooperative execution. This is done at startup with monkey Patch:
#-*- coding:utf8 -*-
from gevent import monkey; monkey.patch_all()
from urllib import request
import gevent
def test(url) :
print('Get: %s' % url)
response = request.urlopen(url)
content = response.read().decode('utf8')
print('%d bytes received from %s.' % (len(content), url))
if __name__ == '__main__':
gevent.joinall([
gevent.spawn(test, 'http://httpbin.org/ip'),
gevent.spawn(test, 'http://httpbin.org/uuid'),
gevent.spawn(test, 'http://httpbin.org/user-agent')])Copy the code
Running results:
Get: http://httpbin.org/ip
Get: http://httpbin.org/uuid
Get: http://httpbin.org/user-agent
53 bytes received from http://httpbin.org/uuid.
40 bytes received from http://httpbin.org/user-agent.
31 bytes received from http://httpbin.org/ip.
Copy the code
As a result, three network operations are executed concurrently, in different order of termination, but with only one thread.
conclusion
In Python, sleep is used to represent asynchronous IO. In real projects, you can use the coroutine to read and write networks, read and write files, render interfaces, etc. While waiting for the coroutine to complete, the CPU can also perform other calculations. So what’s the difference between coroutines and multithreading? The switching of multiple threads needs to be completed by the operating system. When there are more and more threads, the cost of switching will be very high, while the coroutine is switched within a thread, and the switching process is controlled by ourselves, so the cost is much lower, which is the fundamental difference between coroutine and multi-threading.