This is the second day of my participation in the November Gwen Challenge. Check out the details: the last Gwen Challenge 2021

Experiment 2

2.2 train of thought

2.2.1 setting. Py

  • Remove restrictions
ROBOTSTXT_OBEY = False
Copy the code
  • Set the path to save the image
IMAGES_STORE = r'.\images'  The path to save the file
Copy the code
  • Open the pipelines
ITEM_PIPELINES = {    
'weatherSpider.pipelines.WeatherspiderPipeline': 300,}Copy the code
  • Set the request header
DEFAULT_REQUEST_HEADERS = {    
'Accept': 'text/html,application/xhtml+xml,application/xml; Q = 0.9 * / *; Q = 0.8 '.'Accept-Language': 'en'.'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.16 Safari/537.36',}Copy the code

2.2.2 item. Py

  • Sets the field to climb
class WeatherspiderItem(scrapy.Item) :    
number = scrapy.Field()    
pic_url = scrapy.Field()
Copy the code

2.2.3 wt_Spider. Py

  • Send the request
    def start_requests(self) :        
      yield scrapy.Request(self.start_url, callback=self.parse)
Copy the code
  • Get all of the pagesA label
    def parse(self, response) :
        html = response.text
        urlList = re.findall('<a href="(.*?)" ', html, re.S)
        for url in urlList:
            self.url = url
            try:
                yield scrapy.Request(self.url, callback=self.picParse)
            except Exception as e:
                print("err:", e)
                pass
Copy the code
  • Again request all the URLS below the A tag and then find all the pictures back
    def picParse(self, response) :
        imgList = re.findall(r', response.text, re.S)
        for k in imgList:
            if self.total > 102:
                return 
            try:
                item = WeatherspiderItem()
                item['pic_url'] = k
                item['number'] = self.total
                self.total += 1
                yield item
            except Exception as e:
                pass
Copy the code
  • So similar to storing in a database,The data processingAll of them should be herepipelines.pyWord-break: break-all; word-break: break-all; word-break: break-all; word-break: break-all

2.2.4 pipelines. Py

  • Importing the setting information
from weatherSpider.settings import IMAGES_STORE as images_store      Read configuration file information
from scrapy.pipelines.images import ImagesPipeline
settings = get_project_settings()
Copy the code
  • Write save functions
    def get_media_requests(self, item, info) :
        image_url = item["pic_url"]
        yield Request(image_url)
Copy the code
  • It is better to rename the file when it should be saved.