I don’t know how you did in the last article, because this article uses the aioHTTP library to illustrate. If you don’t understand it or you haven’t seen it yet you can go and look at it a little bit too slow, right? Try an asynchronous coroutine to speed things up! Remember to practice more after reading this article so that you can master it.

If the speed of the crawler need to show, I think it’s go to the download images, originally want to choose to fry eggs of the download images, where the beauty of images are of high quality, I draft are almost finished the writing, but today to see, sister figure off entrance, as to why shut, you can look at the curiosity daily to shut down the reason or baidu yesterday, I will not say more here, this time I choose to download non-copyright hd pictures, because we media people are very afraid of infringement, looking for non-copyright pictures seems to be a daily work, so I choose this website

https://unsplash.com/

So what’s the difference between using asynchrony and not using asynchrony?

(The right one uses asynchrony, the left one does not use asynchrony, because it is for testing, so choose to download 12 pictures.)

You can see that the application running with asynchrony takes almost 6 times less time than the application running without asynchrony. So let’s figure out how to do it.

1. Find the target page

This site has a bunch of images on its front page, and it automatically refreshes as it drags down, so it’s obviously ajax loading, but dynamic loading is something we’ve talked about before, so open up the developer tools and see what it looks like.

Down when it is easy to see this request, this is a get request, a status code of 200, https://unsplash.com/napi/photos?page=3&per_page=12&order_by=latest, there are three parameters, It is easy to see that the page parameter is the page, and that this parameter changes, while all other parameters remain the same.

The content returned is a JSON type, and the download under links is the link to download the image. Now that everything is clear, here is the code.

2. Code

async def __get_content(self, link):

       async with aiohttp.ClientSession() as session:

           response = await session.get(link)

           content = await response.read()

           return contentCopy the code

This is the way to get the contents of the image. AiohttpClientSession and requests. Session are used in the same way, except that the method to get the Unicode encoding is read().

Here is the complete code

import requests, os, time

import aiohttp, asyncio



class Spider(object):

   def __init__(self):

       self.headers = {

           'User-Agent''the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}

       self.num = 1

       if 'images' not in os.listdir('. ') :

           os.mkdir('images')

       self.path = os.path.join(os.path.abspath('. '), 'images')

       os.chdir(self.path)  Enter the file download path



   async def __get_content(self, link):

       async with aiohttp.ClientSession() as session:

           response = await session.get(link)

           content = await response.read()

           return content



   def __get_img_links(self, page):

       url = 'https://unsplash.com/napi/photos'

       data = {

           'page': page,

           'per_page'12.

           'order_by''latest'

       }

       response = requests.get(url, params=data)

       if response.status_code == 200:

           return response.json()

       else:

           print('Request failed with status code %s' % response.status_code)



   async def __download_img(self, img):

       content = await self.__get_content(img[1])

       with open(img[0] +'.jpg'.'wb'as f:

           f.write(content)

       print('Downloading %s image succeeded' % self.num)

       self.num += 1



   def run(self):

       start = time.time()

       for x in range(1.101) :Download 100 pages of images, or change the page count yourself

           links = self.__get_img_links(x)

           tasks = [asyncio.ensure_future(self.__download_img((link['id'], link['links'] ['download')))for link in links]

           loop = asyncio.get_event_loop()

           loop.run_until_complete(asyncio.wait(tasks))

           if self.num >= 10:  # Test speed use, if you need to download more than one image can comment this code

               break

       end = time.time()

       print('Run in %s seconds' % (end-start))



def main(a):

   spider = Spider()

   spider.run()



if __name__ == '__main__':

   main()Copy the code

You can see that in less than 50 lines of code you can download images from an entire web site

Write in the last

Climb the fried egg sister picture of the program I wrote, if you need, you can go to the public daily learning Python to find me, only the first 40 ha!! Show you an effect first!

You can leave a message below if you don’t understand. If the article is useful to you, please forward and like to let more people know.

Daily learning python

Code is not just buggy, it’s beautiful and fun