关于Scrapy中ItemProcess的process_item方法没有调用原因

  1. 检查settings.py中ITEM_PIPELINES是否指定Item管道,例如:
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'worm.pipelines.WormPipeline': 100,
}
  1. 如果Item实现了子类的构造,则父类必须显示声明父类构造:
# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy
from scrapy import Field

class TestSpiderItem(scrapy.Item):

    def __init__(self):
        # 如果实现了子类的构造,则必须声明父类构造,
        # 否则无法执行ItemProcess的process_item方法
        super().__init__()
        print('<INFO> TestSpiderItem is instancing.')

    name = Field()
  1. 检查process_item(self, item, spider)方法是否返回一个item或dict对象:
class WormPipeline(object):
    # This method is called for every item pipeline component.
    # process_item() must either: return a dict with data,
    # return an Item (or any descendant class) object,
    # return a Twisted Deferred or raise DropItem exception.
    # Dropped items are no longer processed by further pipeline components.
    def process_item(self, item, spider):
        with open('F:\\text1.txt', 'a') as f:
            f.write(item['author'] + '\n')
        return item
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容