Suck the cat with code! This paper is participating in[Cat Essay Campaign].

First, website analysis

The url of the website for crawling images: www.ivsky.com/

1, first of all, open the website, enter keywords in the search box, we can find that the picture data is paging, we can dynamically crawl multi-page data.

2, check the browser search box URL, we can find that we can replace the “cat” in the url with any other, we can get the picture of any theme we want, of course, the premise is that the server of the url has pictures of these themes.

3. Click the shortcut key Ctrl+Shift+I to open the developer tool.

4. Then right click on any image and click Check. We can find that the href attribute of img tag in the picture below contains the real download address of each image, which is the content we need to match finally.

Second, code practice

1. Obtain web pages related to the topic “cat” in the search box.

(1) Set the request header

headers = { 
"Connection": "close".'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36' 
}
Copy the code

(2) Set keywords

kw = {
    "q": picDes, 
    "page": 1 
}
Copy the code

(3) Send the request

url = "https://www.ivsky.com/search.php" 
response = requests.get(url, headers=headers, params=kw)
Copy the code

2. Match the download address of each picture on each page with the web content obtained in the previous step and the regular expression.

(1) The regular expression matching the download address of each picture

picUrls = re.findall('
      
'
, html) Copy the code

(2) Next page index regular expression

nextUrl = re.findall(', html) nextPage = nextUrl[0].split("=")[-1]
Copy the code

3. Download the image locally from the download address list of the image obtained in the previous step.

with open(filePath + "/" + str(i + 1) + ".jpg"."wb") as f: 
    print(filePath + "/" + str(i + 1) + ".jpg") 
    f.write(res.content)
Copy the code

3. Data crawling results

The results are shown below: