“This is the first day of my participation in the Gwen Challenge in November. See details of the event: The last Gwen Challenge in 2021”.

preface

Use Python to crawl back to the desktop wallpaper, without further ado.

Let’s have a good time

The development tools

Python version: 3.6.4

Related modules:

Requests module;

Re module

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

Thought analysis

Target site mm.enterdesk.com/dalumeinv/1…

Enter the site and drop down to see the following:Click on any image to enter the image details page, which contains a group of images, including large images and thumbnails:This page prohibits the right mouse button, press CTRL + U to view the source code of the page, found that the picture link can be obtained in the source code of the page; There are two links in each image, comparing the two links, one of them has the parameter _360_360, the link without this parameter is the original HD image, and the other one is the standard HD image!The details page is entered by the home page link, we go back to the home page, press CTRL + U to view the page source code; Page source code is found to enter the details of the link, which can be inferred that the home page and details page are static loaded pages!When you drop down the homepage, you find that it keeps loading data, but the URL does not change:However, if you click the bottom to turn the page, the url will change:Therefore, we only need to change the parameters of the url for page turning:

https://mm.enterdesk.com/dalumeinv/1.html
https://mm.enterdesk.com/dalumeinv/2.html
https://mm.enterdesk.com/dalumeinv/3.html
Copy the code

The core code

def main(html_url) : Pass in the home page URL
    response = get_response(html_url) The request function receives the home page URL and requests data
    urls = re.findall('.*? ', response.text)[31:47] Extract the detail page URL
    for link in urls:
        response_ = get_response(link)The requesting function receives the detail page URL and requests data
        image_url = re.findall('src="(https://up.enterdesk.com/edpic/.*?)"', response_.text)[1:] Extract the image URL
        url_data(image_url) # return image URL
Copy the code

Delete selected data

Complete source code can be seen in the profile of the personal home page for access

Data stored locally