preface
The text and pictures in this article come from the network, only for learning, communication, do not have any commercial purposes, if you have any questions, please contact us to deal with.
Basic Environment Configuration
- Python 3.6
- pycharm
- requests
- parsel
The related module PIP can be installed
Define your site’s goals
! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/fb6d5035179c4cc3bc785d7e22c54e54)
! [](https://p3-tt-ipv6.byteimg.com/large/pgc-image/ac644abe95ed48ab8bfa4486bd83156c)
Basic general operation, F12 open developer tools, analyze web pages
! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/8523ce43c3764115b95ecf550141680d)
The data returned by the web page is a pile of garbled code, and then write code when you can turn the code is good static web pages are very simple, it is possible to directly grab the data you want
The request page
import requests import re url = 'http://www.sccnn.com/shiliangtuku/default({}).html'.format(page) headers = { 'user-agent ': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} Response = requests. Get (url=url, headers=headers) response.encoding = response.apparent_encodingCopy the code
Analyze web pages and data
import parsel r = re.findall('<a href="(.*?) " target="_blank">', response.text) urls = r[2:] for i in urls: page_url = 'http://www.sccnn.com' + i response_2 = requests.get(url=page_url, headers=headers) response_2.encoding = response_2.apparent_encoding selector = parsel.Selector(response_2.text) title = selector.css('#LeftBox h2::text').get() img_url = selector.css('#LeftBox .PhotoDiv img::attr(src)').get()Copy the code
Save the data
def downlaod(title, url): Path = 'D:\ python\\demo\\ material website \\img\\' + title + '.jpg' response = requests. Get (url=url, Headers =headers) with open(path, mode='wb') as f: f.write(response.content) print(' downloading {}'. Format (title))Copy the code
Implementation effect
! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/216fb67cc72e4439844f445c204035c6)
! [](https://p26-tt.byteimg.com/large/pgc-image/da76eeb395e14d97b6d77b2de7773d1e)
Complete project code background