This article is participating in the Python Theme Month, check out the detailsActive link

Crawl target

Url: Absolute domain

Tool use

Development environment: Win10, Python3.7 development tools: PyCharm, Chrome kit: Requests, LXML

Project idea Analysis

Select your corresponding picture category

According to the classification information, the hyperlinks without pictures are extracted, the jump address of label A and the title name of the picture are extracted

def get_url(start_url):
    response = requests.get(start_url, headers=headers).text
    data = etree.HTML(response)
    new_url = data.xpath('//div[@class="post-module-thumb"]/a/@href')
    for url in new_url:
        yield url
Copy the code

Enter the detail page and extract all the picture addresses of the detail page by xpath:

Send the picture data request, save the corresponding picture data information, is not super simple hey hey (*╹▽╹*)

Simple source sharing

Import requests from LXML import etree headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"} def get_url(start_url): response = requests.get(start_url, headers=headers).text data = etree.HTML(response) new_url = data.xpath('//div[@class="post-module-thumb"]/a/@href') for url in new_url: yield url def get_img(url): response = requests.get(url, headers=headers).text img_data = etree.HTML(response) img_url = img_data.xpath('//div[@class="entry-content"]/img/@src')  for img_url in img_url: Name = img_url.split("/")[-1] + img_url.split("/")[-1] result = request.get (img_url).content with open(" /" + name, "Wb") as f: f.w rite (result) print (" is downloading ", name) if __name__ = = "__main__ ': for I in range (1, 3) : start_url = "https://www.jdlingyu.com/tuji/hentai/gctt/page/{}".format(i) html_url = get_url(start_url) for url in html_url: get_img(url)Copy the code

I am white and white I, a love to share knowledge of the program yuan ❤️

If you have no contact with the programming section of friends to see this blog, found not to do, you can directly leave a message [thank you very much for your likes, favorites, concerns, comments, a key four connect support]