The first article used find CSS selector to crawl the data of the top 100 hospitals. Today, the xpath method of Request-HTML is used to automatically page-load the hd beauty image.

Analyze web page elements

The entire feed stream is a Table enclosed by div(class =TypeList), img is in the a tag. If you download an image and look at it locally, you will find that the image is a thumbnail. If you have any development experience, you will know that this is the most basic optimization method

The thumbnail… The original…

You’ll notice that thumbnails have just one more small word, but the rules are simple, or you’ll get into trouble.

Ii Script Code

# coding=UTF-8 from requests_html import HTMLSession import urllib.request import time # Coding =UTF-8 from requests_html import HTMLSession import urllib.request import time While page <= 2; url = ''+ str(page) + '.htm' page += 1 r = session.get(url) data = XML ("//div[@class='TypeList']/ul/li/a/img" r.html.xpath("//div[@class='TypeList']/ul/li/a/span/text()") for items in data: SmallImgUrl = items.find("img",first=True). Attrs [' SRC ' Smallimgurl.replace ("small","") t = int(round(time.time() * 1000)) # '/Users/lsr/Documents/GJProject/py/girls/' + str(t) + ".jpg" urllib.request.urlretrieve(bigImgUrl,path)Copy the code

Find the entire list node according to TypeList and extract all img according to path

Data = r.html. Xpath ("//div[@class='TypeList']/ul/li/a/img") smallImgUrl = items.find("img",first=True).attrs['src']Copy the code

Three crawl results