“This is the first day of my participation in the Gwen Challenge in November. See details of the event: The last Gwen Challenge in 2021”.
preface
Use Python to crawl back to the desktop wallpaper, without further ado.
Let’s have a good time
The development tools
Python version: 3.6.4
Related modules:
Requests module;
Re module
And some modules that come with Python.
Environment set up
Install Python and add it to the environment variables. PIP installs the required related modules.
Thought analysis
Target site mm.enterdesk.com/dalumeinv/1…
Enter the site and drop down to see the following:Click on any image to enter the image details page, which contains a group of images, including large images and thumbnails:This page prohibits the right mouse button, press CTRL + U to view the source code of the page, found that the picture link can be obtained in the source code of the page; There are two links in each image, comparing the two links, one of them has the parameter _360_360, the link without this parameter is the original HD image, and the other one is the standard HD image!The details page is entered by the home page link, we go back to the home page, press CTRL + U to view the page source code; Page source code is found to enter the details of the link, which can be inferred that the home page and details page are static loaded pages!When you drop down the homepage, you find that it keeps loading data, but the URL does not change:However, if you click the bottom to turn the page, the url will change:Therefore, we only need to change the parameters of the url for page turning:
https://mm.enterdesk.com/dalumeinv/1.html
https://mm.enterdesk.com/dalumeinv/2.html
https://mm.enterdesk.com/dalumeinv/3.html
Copy the code
The core code
def main(html_url) : Pass in the home page URL
response = get_response(html_url) The request function receives the home page URL and requests data
urls = re.findall('.*? ', response.text)[31:47] Extract the detail page URL
for link in urls:
response_ = get_response(link)The requesting function receives the detail page URL and requests data
image_url = re.findall('src="(https://up.enterdesk.com/edpic/.*?)"', response_.text)[1:] Extract the image URL
url_data(image_url) # return image URL
Copy the code
Delete selected data
Complete source code can be seen in the profile of the personal home page for access