This is the second day of my participation in the November Gwen Challenge. Check out the details: the last Gwen Challenge 2021

Experiment 2

2.2 train of thought

2.2.1 setting. Py

Remove restrictions

ROBOTSTXT_OBEY = False
Copy the code

Set the path to save the image

IMAGES_STORE = r'.\images'  The path to save the file
Copy the code

Open the pipelines

ITEM_PIPELINES = {    
'weatherSpider.pipelines.WeatherspiderPipeline': 300,}Copy the code

Set the request header

DEFAULT_REQUEST_HEADERS = {    
'Accept': 'text/html,application/xhtml+xml,application/xml; Q = 0.9 * / *; Q = 0.8 '.'Accept-Language': 'en'.'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.16 Safari/537.36',}Copy the code

2.2.2 item. Py

Sets the field to climb

class WeatherspiderItem(scrapy.Item) :    
number = scrapy.Field()    
pic_url = scrapy.Field()
Copy the code

2.2.3 wt_Spider. Py

Send the request

    def start_requests(self) :        
      yield scrapy.Request(self.start_url, callback=self.parse)
Copy the code

Get all of the pagesA label

    def parse(self, response) :
        html = response.text
        urlList = re.findall('<a href="(.*?)" ', html, re.S)
        for url in urlList:
            self.url = url
            try:
                yield scrapy.Request(self.url, callback=self.picParse)
            except Exception as e:
                print("err:", e)
                pass
Copy the code

Again request all the URLS below the A tag and then find all the pictures back

    def picParse(self, response) :
        imgList = re.findall(r', response.text, re.S)
        for k in imgList:
            if self.total > 102:
                return 
            try:
                item = WeatherspiderItem()
                item['pic_url'] = k
                item['number'] = self.total
                self.total += 1
                yield item
            except Exception as e:
                pass
Copy the code

So similar to storing in a database,The data processingAll of them should be herepipelines.pyWord-break: break-all; word-break: break-all; word-break: break-all; word-break: break-all

2.2.4 pipelines. Py

Importing the setting information

from weatherSpider.settings import IMAGES_STORE as images_store      Read configuration file information
from scrapy.pipelines.images import ImagesPipeline
settings = get_project_settings()
Copy the code

Write save functions

    def get_media_requests(self, item, info) :
        image_url = item["pic_url"]
        yield Request(image_url)
Copy the code

It is better to rename the file when it should be saved.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Scrapy frame crawls weather web pictures

Experiment 2

2.2 train of thought

2.2.1 setting. Py

2.2.2 item. Py

2.2.3 wt_Spider. Py

2.2.4 pipelines. Py

Scrapy frame crawls weather web pictures

Experiment 2

2.2 train of thought

2.2.1 setting. Py

2.2.2 item. Py

2.2.3 wt_Spider. Py

2.2.4 pipelines. Py

Related Posts

Spring Boot Debug Spring Boot Debug

How to implement a distributed lock with zooKeeper

Computer network -OSI model