Scrapy抓取数据输出到CSV文件,不是按照items.py中定义的字段的顺序。
from scrapy import Field,Item
class JsuserItem(Item):
author = Field()
url = Field()
title = Field()
reads = Field()
comments = Field()
likes = Field()
rewards = Field()
如何在输出CSV文件时,按指定的顺序输出呢?
1)在spiders中增加文件csv_item_exporter.py
from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter
class MyProjectCsvItemExporter(CsvItemExporter):
def __init__(self, *args, **kwargs):
delimiter = settings.get('CSV_DELIMITER', ',')
kwargs['delimiter'] = delimiter
fields_to_export = settings.get('FIELDS_TO_EXPORT', [])
if fields_to_export :
kwargs['fields_to_export'] = fields_to_export
super(MyProjectCsvItemExporter, self).__init__(*args, **kwargs)
2)settings.py中
FEED_EXPORTERS = {
'csv': 'jsuser.spiders.csv_item_exporter.MyProjectCsvItemExporter',
} #jsuser为工程名
FIELDS_TO_EXPORT = [
'author',
'title',
'url',
'reads',
'comments',
'likes',
'rewards'
]
再次爬取数据时,就会按照指定的列顺序来输出了。
还可以在settings.py中指定csv文件中的分隔符
CSV_DELIMITER = "\t"