The first article used find CSS selector to crawl the data of the top 100 hospitals. Today, the xpath method of Request-HTML is used to automatically page-load the hd beauty image.

Analyze web page elements

The entire feed stream is a Table enclosed by div(class =TypeList), img is in the a tag. If you download an image and look at it locally, you will find that the image is a thumbnail. If you have any development experience, you will know that this is the most basic optimization method

The thumbnail kr.shanghai-jiuxin.com/file/2021/0… The original kr.shanghai-jiuxin.com/file/2021/0…

You’ll notice that thumbnails have just one more small word, but the rules are simple, or you’ll get into trouble.

Ii Script Code

# coding=UTF-8 from requests_html import HTMLSession import urllib.request import time # Coding =UTF-8 from requests_html import HTMLSession import urllib.request import time While page <= 2; url = 'https://www.umei.cc/bizhitupian/meinvbizhi/'+ str(page) + '.htm' page += 1 r = session.get(url) data = XML ("//div[@class='TypeList']/ul/li/a/img" r.html.xpath("//div[@class='TypeList']/ul/li/a/span/text()") for items in data: SmallImgUrl = items.find("img",first=True). Attrs [' SRC ' Smallimgurl.replace ("small","") t = int(round(time.time() * 1000)) # '/Users/lsr/Documents/GJProject/py/girls/' + str(t) + ".jpg" urllib.request.urlretrieve(bigImgUrl,path)Copy the code

Find the entire list node according to TypeList and extract all img according to path

Data = r.html. Xpath ("//div[@class='TypeList']/ul/li/a/img") smallImgUrl = items.find("img",first=True).attrs['src']Copy the code

Three crawl results