Asynchronous is the third way to realize concurrency after multi-thread and multi-process, which is mainly used to improve the operation efficiency of IO intensive tasks. Asynchrony in Python is based on the yield generator. Before explaining how this works, let’s use the asyncio library.

This paper mainly explains the generality of asyncio module, and the details of some functions are simply skipped.

This paper is divided into the following parts

  • Easiest to use
  • Another common use
  • A problem
  • Asynchrony under general functions
  • Understand asynchrony and coroutine
  • Asynchronous crawler for a single thread

Easiest to use

import asyncio

async def myfun(i):

print('start {}th'.format(i))

await asyncio.sleep(1)

print('finish {}th'.format(i))

loop = asyncio.get_event_loop()

myfun_list = (myfun(i) for i in range(10))

loop.run_until_complete(asyncio.gather(*myfun_list))

Copy the code

Running this way, 10 waits total 1 second.

Remember some of the conventions of the above code, such as

  • To run a function asynchronously, prefix the function definitionasync
  • Just remember the last three lines and pass in the function

Another common use

Here’s the first common usage, and here’s another

import asyncio

async def myfun(i):

print('start {}th'.format(i))

await asyncio.sleep(1)

print('finish {}th'.format(i))

loop = asyncio.get_event_loop()

myfun_list = [asyncio.ensure_future(myfun(i)) for i in range(10)]

loop.run_until_complete(asyncio.wait(myfun_list))

Copy the code

The difference between this usage and the above one is that the following call is asyncio.gather or asyncio.wait. Currently, it is considered completely equivalent, so you can use either of the above.

Here are two of the most common ones, listed here to make sure readers don’t get confused when reading other articles.

And there are subtle differences

  • gatherThey’re better at aggregating functions
  • waitBetter at filtering health

See this answer for details

A problem

The Asyncio module has one big difference from the multithreaded, multiprocess classes we learned earlier: the function passed in is not arbitrary

  • Let’s say we put the topmyfunIn the functionsleepSwitch totime.sleep(1), the runtime is not asynchronous, but synchronous, waiting for 10 seconds
  • If I change itmyfunFor example, use the following onerequestCrawl the function of the web page
import asyncio

import requests

from bs4 import BeautifulSoup

async def get_title(a):

url = 'https://movie.douban.com/top250?start={}&filter='.format(a*25)

r = requests.get(url)

soup = BeautifulSoup(r.content, 'html.parser')

lis = soup.find('ol', class_='grid_view').find_all('li')

for li in lis:

title = li.find('span', class_="title").text

print(title)

loop = asyncio.get_event_loop()

fun_list = (get_title(i) for i in range(10))

loop.run_until_complete(asyncio.gather(*fun_list))

Copy the code

It still doesn’t execute asynchronously.

Asynchrony only triggers asynchrony for its own defined sleep(await asyncio.sleep(1)).

Asynchrony under general functions

For the above functions, the asyncio library can only implement asynchrony by adding threads. Let’s implement asynchrony for time.sleep

import asyncio

import time

def myfun(i):

print('start {}th'.format(i))

time.sleep(1)

print('finish {}th'.format(i))

async def main(a):

loop = asyncio.get_event_loop()

futures = (

loop.run_in_executor(

None.

myfun,

i)

for i in range(10)

)

for result in await asyncio.gather(*futures):

pass

loop = asyncio.get_event_loop()

loop.run_until_complete(main())

Copy the code

Run_in_executor actually starts a new thread and coordinates the threads. Call process is more complex, as long as the template can be used.

The 10 loops above are still not printed all at once, but as batches. This is because there are not enough threads to open. If you want to print at once, you can open 10 threads, as shown below

import concurrent.futures as cf Add one more module

import asyncio

import time

def myfun(i):

print('start {}th'.format(i))

time.sleep(1)

print('finish {}th'.format(i))

async def main(a):

with cf.ThreadPoolExecutor(max_workers = 10) as executor: # set 10 threads

loop = asyncio.get_event_loop()

futures = (

loop.run_in_executor(

executor, Execute in 10 threads

myfun,

i)

for i in range(10)

)

for result in await asyncio.gather(*futures):

pass

loop = asyncio.get_event_loop()

loop.run_until_complete(main())

Copy the code

The code to implement the Requests asynchronous crawler in this way is shown below

import concurrent.futures as cf

import asyncio

import requests

from bs4 import BeautifulSoup

def get_title(i):

url = 'https://movie.douban.com/top250?start={}&filter='.format(i*25)

r = requests.get(url)

soup = BeautifulSoup(r.content, 'html.parser')

lis = soup.find('ol', class_='grid_view').find_all('li')

for li in lis:

title = li.find('span', class_="title").text

print(title)

async def main(a):

with cf.ThreadPoolExecutor(max_workers = 10) as executor:

loop = asyncio.get_event_loop()

futures = (

loop.run_in_executor(

executor,

get_title,

i)

for i in range(10)

)

for result in await asyncio.gather(*futures):

pass

loop = asyncio.get_event_loop()

loop.run_until_complete(main())

Copy the code

This part refers to the article and this answer

This way of starting multiple threads is also considered asynchronous, as explained in the next section.

Understand asynchrony and coroutine

Now that we’ve covered some of the uses of asynchrony, it’s time to explain some concepts

  • First of all, we should make clear the connection between synchronous, asynchronous, blocking and non-blocking
  • The first thing to be clear about is that the first two are not exactly the same thing, they are not saying the same thing, but they are very similar and easy to confuse
  • Asynchronous programs are generally said to be non-blocking, while synchronization can be both blocking and non-blocking
  • Non-blocking means that when a task is not finished, there is no need to stop and wait for it to finish before starting the next task. Blocking, on the other hand, is the complete end of one thing before another begins
  • In the non-blocking case, both synchronous and asynchronous are possible, both of which can start a task before it finishes. The difference between the two is that :(and called the main program in progress) when the first program finished (for example, the network request finally responded), it will automatically notify the main program back to continue the results of the first task, this is asynchronous; Synchronization, on the other hand, requires the main program to constantly ask if the first program has finished.
  • The difference of four words refer to zhihu answer
  • (The difference between coroutines and multithreading) In the non-blocking case, multithreading is the representative of synchronization, and coroutines are the representative of asynchrony. Both open multiple threads
  • In multithreading, multiple threads compete to be the first to run, and the main program is not notified when a wait is over, so random running without any rules can cause some waste of resources
  • In coroutines, calls and waits by multiple threads (called microthreads) are organized by explicit code. Coroutines are like purposefully performing task after task, whereas multithreading has some wandering time
  • Two kinds of asynchronous
  • The previous sections covered two types of asynchrony, one of which isawaitOnly one thread can be used to achieve task switching, another is to enable multiple threads, through thread scheduling to achieve asynchronous
  • Switching tasks back and forth between multiple functions with a single thread is typically done using the Yield generator, which you can see at the end of this article producing the consumer example
  • Multi-process, multi-threaded, asynchronous is good at direction
  • Asynchronous and multithreading are particularly good for IO intensive tasks because they are essentially trying to avoid wasting resources due to IO latency. Multi-process can take advantage of multi-core, suitable for CPU-intensive tasks
  • Compared with multithreading, asynchrony is more suitable for programs with longer wait times and more tasks to wait. Because multithreading, after all, creates new threads, too many threads make contention more obvious and waste more resources. If the wait time for each task is too long, then the wait time will open up a lot of tasks, a lot of threads, and using multiple threads is not a wise decision. Asynchronism, on the other hand, allows only one thread to work smoothly between tasks, making full use of CPU resources without affecting program efficiency

Asynchronous crawler for a single thread

Above we made requests asynchronous by opening multiple threads. If we wanted to use only one thread (with await), we would have to use a web request function instead.

In fact, to use await, you must be an Awaitable object, which is why requests cannot be used. Converting to Awaitable objects is not something we need to implement ourselves, as there is now an AIoHTTP module that seamlessly interconnects web requests with asyncio. Rewrite the code using this module as follows

import asyncio

import aiohttp

from bs4 import BeautifulSoup

async def get_title(i):

url = 'https://movie.douban.com/top250?start={}&filter='.format(i*25)

async with aiohttp.ClientSession() as session:

async with session.get(url) as resp:

print(resp.status)

text = await resp.text()

print('start', i)

soup = BeautifulSoup(text, 'html.parser')

lis = soup.find('ol', class_='grid_view').find_all('li')

for li in lis:

title = li.find('span', class_="title").text

print(title)

loop = asyncio.get_event_loop()

fun_list = (get_title(i) for i in range(10))

loop.run_until_complete(asyncio.gather(*fun_list))

Copy the code

Welcome to my zhihu column

Column home: Programming in Python

Table of contents: table of contents

Version description: Software and package version description