4 problems with Scrapy's little experiment

Forbidden by robots.txt

An error screenshot

Study:

After checking robot.txt, I found a Robot protocol, which defines which web pages or files are allowed to be captured by crawler machines on this site. You can visit the link www.baidu.com/robots.txt to view permissions

User-agent: Baiduspider

Disallow: /baidu

Scrapy defaults to the Robot protocol, so we just don’t have to.

Solution:

Disable scrapy’s ROBOTSTXT_OBEY function, locate the variable in Setting, and set it to False.

TypeError: Object of type ‘Selector’ is not JSON serializable

Problem: JSON serialization failed

Reason: Forget extract()

Extract () : serialize this node as a Unicode string and return a list

Write () argument must be STR, not bytes

Error message:

Code:

filename=open('test.json'.'w')
Copy the code

Solution:

Change to WB and open it in binary write mode

Install scrapy

Environment:

Python version: 3.6.3 MacOS: 10.13.5

Installation:

Pip3 install Scrapy – user

An error

Execute scrapy -v to execute scrapy -v

bash: scrapy: command not found

The solution

Check to see if dependencies are properly installed by clicking on scrapy’s Github and comparing PIP list with setup.py

Dependencies are installed correctly to create soft chains

Find / -name scrapy

Ln -s/Users/macbook/Library/Python / 3.6 / bin/scrapy/usr/local/bin/scrapy

Just execute the scrapy command

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

4 problems with Scrapy’s little experiment

Forbidden by robots.txt

An error screenshot

Study:

Solution:

TypeError: Object of type ‘Selector’ is not JSON serializable

Problem: JSON serialization failed

Reason: Forget extract()

Write () argument must be STR, not bytes

Error message:

Code:

Solution:

Install scrapy

Environment:

Installation:

An error

The solution

4 problems with Scrapy’s little experiment

Forbidden by robots.txt

An error screenshot

Study:

Solution:

TypeError: Object of type ‘Selector’ is not JSON serializable

Problem: JSON serialization failed

Reason: Forget extract()

Write () argument must be STR, not bytes

Error message:

Code:

Solution:

Install scrapy

Environment:

Installation:

An error

The solution

Related Posts

Recommend an AI-based IED plug-in to help improve coding efficiency

Rust Report Card – Generates quality reports for your Rust code

Install Redis in Linux