First, write first
Hello everyone, I’m Charlie, as an office worker who faces computer every day.
Every day my wallpaper is Windows built-in sky blue, look really boring, interesting, boring ~
So, as we all know, I’m a blogger who loves quality wallpaper, and of course a whole bunch of high-quality wallpaper, no other meaning.
All right, no more beep beep, start today’s quality journey ~
Second, preparation
Arrange for all this
Python 3.6 Pycharm requests ParselCopy the code
3. Crawler process
1) Data source search:
1. Determine target requirements: crawl hd wallpaper pictures (other shore)
Use developer tools (F12 or right mouse click check) to find the url source of the image; Request the details page of the wallpaper to obtain its page source code to obtain the image URL address (one); The request list page gets the details page URL and title for each wallpaper.
2) Code implementation:
1. Send the request
Wallpaper list page url: www.netbian.com/1920×1080/i…
2. Get data
Page source/response.text Page text data
Parse the data
CSS xpath BS4 re Wallpaper details page URL: /desk/23397.htm 2 Wallpaper of the title
4. Save data
Save images are binary data
Grandpa: Is that it? Code? What do you mean the code won’t let you?
Don’t panic. It’s coming. It’s coming
Iv. Code display
I will not a disassembly, notes and the third step, I believe that you can understand the smart, it is not the last I put video explanation.
PIP install requests import parsel import time PIP install Parsel import time Time_1 = time.time() # for page in range(2, 12): Print (f '= = = = = = = = = = = = = = = = = = = = is climbing a pick up the first {page} page data content = = = = = = = = = = = = = = = = = = = =') url = F 'http://www.netbian.com/1920x1080/index_ {page}. HTM' # request header: the python code disguised as a browser to the server sends the request headers = {' the user-agent: 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36'} Response = requests. Get (url=url, Headers =headers) # Apparent_encoding = Response.content.decode (' GBK ') Response.encoding = Response.apparent_encoding # Apparent_encoding # Retrieve source code/retrieve web page text data Response.text # print(response.text) # parse data selector = parsel.selector (response.text) # CSS Lis = selector. CSS ('.list li') for li in lis: # http://www.netbian.com/desk/23397.htm title = li.css('b::text').get() if title: href = 'http://www.netbian.com' + li.css('a::attr(href)').get() response_1 = requests.get(url=href, headers=headers) selector_1 = parsel.Selector(response_1.text) img_url = selector_1.css('.pic img::attr(src)').get() img_content = requests.get(url=img_url, headers=headers).content with open('img\\' + title + '.jpg', mode='wb') as f: F.write (img_content) print(' save: ', title) time_2 = time.time() use_time = int(time_2) -int (time_1) print(f' use_time} seconds ')Copy the code
You can run it yourself, remember three wow