Deploy the scrapy project

Learning goals

Know how to use Scrapyd

1. Introduction to Scrapyd

Scrapyd is an application for deploying and running crawlers. It allows you to deploy crawler projects and control crawler runs through the JSON API. Scrapyd is a daemon that listens for crawler runs and requests, and then starts the process to execute them

The JSON API is essentially a WebAPI for POST requests

2. Scrapyd installation

PIP install scrapyd

PIP install scrapyd-client

3. Start the Scrapyd service

Command to initiate scrapyd under the project path: sudo scrapyd or scrapyd
When you start it, you can open up your native scrapyd. You can view the monitor screen of your native scrapyd by visiting port 6800 in your browser

Click Job to view the task monitoring page

4. Scrapy project deployment

4.1 Configuring projects to be deployed

Edit the scrapy. CFG file of the project to be deployed (which crawler needs to be deployed to scrapyd, configure this file of the project)

[deploy: deployment name (you can customize the deployment name)] url = http://localhost:6800/ project = project name (name used to create crawler project)Copy the code

4.2 Deploy items to Scrapyd

Also execute under the scrapy project path:

Scrapyd -deploy Deploy name (the name specified in the configuration file) -p Project name

Once the deployment is successful, you can see the deployed project

4.3 Manage scrapy projects

Launch project:curl http://localhost:6800/schedule.json -d project=project_name -d spider=spider_name

Close crawler:curl http://localhost:6800/cancel.json -d project=project_name -d job=jobid

Note; Curl is a command-line utility that requires additional installation if you don’t have it

4.4 Use the Requests module to control scrapy projects

import requests

Start crawler
url = 'http://localhost:6800/schedule.json'
data = {
	'project': Project name,'spider'} resp = requests. Post (url, data=data)# stop crawler
url = 'http://localhost:6800/cancel.json'
data = {
	'project': Project name,'job'} resp = requests. Post (url, data=data)Copy the code

5. Learn about other Scrapyd Webapis

Curl http://localhost:6800/listprojects.json (list items)
curl http://localhost:6800/listspiders.json? Project =myspider (listing crawler)
curl http://localhost:6800/listjobs.json? Project =myspider (list job)
Curl http://localhost:6800/cancel.json – d project job = = myspider – d tencent (termination of the crawler, the functions have a delay or termination of the crawler, not available at this time to stop) kill 9 kill process
Scrapyd and other Webapis. Search baidu.com to learn more

summary

Execute in the scrapy project pathsudo scrapydorscrapydStart scrapyd service Or start as a background processnohup scrapyd > scrapyd.log 2>&1 &
Deploy the scrapy crawler projectscrapyd-deploy -p myspider
Start a crawler in the crawler projectcurl http://localhost:6800/schedule.json -d project=myspider -d spider=tencent