1. Start pipeline
ITEM_PIPELINES = {
# 'jingxi.pipelines.JingxiPipeline': 200,
'jingxi.pipelines.BaiduPipeline': 300,
'jingxi.pipelines.TencentPipeline': 100,
}
Copy the code
After multiple pipelines are opened, the yield item will be circulated among all pipelines. The sequence of the flow is based on the assigned number. The smaller the number, the higher the flow order.
2. How to distinguish different pipelines
class BaiduPipeline: def open_spider(self, spider): if spider.name ! Def process_item(self, item, spider): = self, item, spider Print ('&&&&&&&&&',spider. Name,item) print(' access ') print(' access ') return item def close_spider(self, spider): Def open_spider(self, spider): def spider(self, spider): if spider. Def process_item(self, item, spider): if self, spider. = 'tencent': Return item def close_spider(self, def close_spider) return item def close_spider(self, def close_spider) spider): if spider.name ! = 'Tencent ': return print(" run ")Copy the code
In simple terms, different crawlers are distinguished by spider. When the item is transferred to the crawler, it can be cut off to continue the flow. It can be thrown directly as it flows through the target pipeline.
Introduce Settings in Pipelin
import pymongo
class MongoPipeline(object):
collection_name = 'scrapy_items'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
)
def open_spider(self, spider):
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
self.client.close()
def process_item(self, item, spider):
self.db[self.collection_name].insert(dict(item))
return item
Copy the code
Simple and crude, using pipeline class method from_crawler