今天下午在使用scrapy抓取豆瓣上的python标签书籍,存为json文件时,json文件出现乱码。
中文默认是Unicode,如:
\u5317\u4eac\u5927\u5b66
在setting文件settings.py中设置:
FEED_EXPORT_ENCODING = 'utf-8'
就可以解决了
第二种解决办法
或在cmd中传入 -s FEED_EXPORT_ENCODING='utf-8'
scrapy crawl -o item.json -s FEED_EXPORT_ENCODING='utf-8' douban
第三种解决办法
1、在setting文件settings.py中设置:
ITEM_PIPELINES = [‘xxx.pipelines.JsonWithEncodingPipeline’]
2、添加JsonWithEncodingPipeline文件如下
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
import json
import codecs
import os
class JsonWithEncodingPipeline(object):
def __init__(self):
self.file = codecs.open('scraped_data_utf8.json', 'w', encoding='utf-8')
self.file.write('[')
def process_item(self, item, spider):
line = json.dumps(dict(item), ensure_ascii=False) + "\n"
self.file.write(line+',')
return item
def close_spider(self, spider):
self.file.seek(-1, os.SEEK_END)
self.file.truncate();
self.file.write(']')
self.file.close()