preface

The text and pictures in this article come from the network, only for learning, communication, do not have any commercial purposes, if you have any questions, please contact us to deal with.

Basic Environment Configuration

  • Python 3.6
  • pycharm
  • requests
  • parsel

The related module PIP can be installed

Define your site’s goals

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/fb6d5035179c4cc3bc785d7e22c54e54)
! [](https://p3-tt-ipv6.byteimg.com/large/pgc-image/ac644abe95ed48ab8bfa4486bd83156c)

Basic general operation, F12 open developer tools, analyze web pages

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/8523ce43c3764115b95ecf550141680d)

The data returned by the web page is a pile of garbled code, and then write code when you can turn the code is good static web pages are very simple, it is possible to directly grab the data you want

The request page

import requests import re url = 'http://www.sccnn.com/shiliangtuku/default({}).html'.format(page) headers = { 'user-agent ': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} Response = requests. Get (url=url, headers=headers) response.encoding = response.apparent_encodingCopy the code

Analyze web pages and data

import parsel r = re.findall('<a href="(.*?) " target="_blank">', response.text) urls = r[2:] for i in urls: page_url = 'http://www.sccnn.com' + i response_2 = requests.get(url=page_url, headers=headers) response_2.encoding = response_2.apparent_encoding selector = parsel.Selector(response_2.text) title = selector.css('#LeftBox h2::text').get() img_url = selector.css('#LeftBox .PhotoDiv img::attr(src)').get()Copy the code

Save the data

def downlaod(title, url): Path = 'D:\ python\\demo\\ material website \\img\\' + title + '.jpg' response = requests. Get (url=url, Headers =headers) with open(path, mode='wb') as f: f.write(response.content) print(' downloading {}'. Format (title))Copy the code

Implementation effect

! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/216fb67cc72e4439844f445c204035c6)
! [](https://p26-tt.byteimg.com/large/pgc-image/da76eeb395e14d97b6d77b2de7773d1e)

Complete project code background