初学scrapy的坑

爬取腾讯招聘,scrapy项目

items配置

import scrapy


class TencentItem(scrapy.Item):

    positionName = scrapy.Field()

spider配置

# -*- coding: utf-8 -*-
import scrapy
from TenCent.items import TencentItem

class TencentSpider(scrapy.Spider):
    name = 'tencent'
    allowed_domains = ['tencent.com']
    #start_urls = ['http://tencent.com/']
    baseURL = "https://hr.tencent.com/position.php?&start="
    offset = 0
    start_urls = [baseURL + str(offset)]

    def parse(self, response):

        node_list = response.xpath("//tr[@class='even'] | //tr[@class='odd']")

        for node in node_list:

            item = TencentItem()

            item['positionName'] = node.xpath("./td[1]/a/text()").extract()[0]

            yield item

        if self.offset < 30:
            self.offset += 10
            url = self.baseURL + str(self.offset)
            yield scrapy.Request(url,callback=self.parse)

settings配置

BOT_NAME = 'TenCent'

SPIDER_MODULES = ['TenCent.spiders']
NEWSPIDER_MODULE = 'TenCent.spiders'


# Obey robots.txt rules
ROBOTSTXT_OBEY = True


ITEM_PIPELINES = {
   'TenCent.pipelines.TencentPipeline': 300,
}

pipelines配置

import json


class TencentPipeline(object):

    def _init_(self):
        self.file = codecs.open("tencent.json","w")

    def process_item(self, item, spider):
        content = json.dumps(dict(item), ensure_ascii=False)
        self.file.write(content.encode("utf-8")) + ",\n"
        return item


    def close_spider(self):
        self.file.close()

蛋疼的报错

2018-05-10 09:11:12 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: TenCent)
2018-05-10 09:11:12 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 17.1.0, Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 17.5.0 (OpenSSL 1.0.2n  7 Dec 2017), cryptography 2.1.4, Platform Windows-7-6.1.7601-SP1
2018-05-10 09:11:12 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'TenCent', 'NEWSPIDER_MODULE': 'TenCent.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['TenCent.spiders']}
2018-05-10 09:11:12 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2018-05-10 09:11:13 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-05-10 09:11:13 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-05-10 09:11:13 [scrapy.middleware] INFO: Enabled item pipelines:
['TenCent.pipelines.TencentPipeline']
2018-05-10 09:11:13 [scrapy.core.engine] INFO: Spider opened
2018-05-10 09:11:13 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-05-10 09:11:13 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-05-10 09:11:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hr.tencent.com/robots.txt> (referer: None)
2018-05-10 09:11:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hr.tencent.com/position.php?&start=0> (referer: None)
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说游戏资深运营经理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类资深项目管理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类内容IP版权管理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'SNG04-广告业务后台工程师(上海)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'SNG04-UGC质量后台开发工程师(上海)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '25667-运营商渠道销售(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'MIG08-后台开发高级工程师(广州)(腾讯WiFi管家)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'MIG16-车联网大数据及算法产品经理'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类游戏商业化运营(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类美术编辑(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hr.tencent.com/position.php?&start=10> (referer: https://hr.tencent.com/position.php?&start=0)
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类内容编辑(深圳/北京)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类内容编辑(深圳/北京)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类资深内容责编(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'SNG08-高级品牌视觉设计师(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'TEG09-推荐系统后台开发工程师(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'WXG06-321 微信境外支付高级区域经理(日本)微信境外支付高级区域经理(台北)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类游戏策划(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类平台高级产品经理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '21087-互动小说类游戏版本管理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'WXG06-321 微信境外支付高级区域经理(香港)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hr.tencent.com/position.php?&start=20> (referer: https://hr.tencent.com/position.php?&start=10)
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'WXG06-321 微信境外支付高级区域经理(日本)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'WXG06-321 微信境外支付高级区域经理(日本)微信境外支付高级区域经理(泰国)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '16810-动漫商务经理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '16810-动漫付费业务编辑(北京)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'WXG06-321 微信境外支付高级区域经理(日本)微信境外支付高级区域经理(荷兰))'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '15612-手游关卡策划(北京)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '15575-《圣斗士星矢》资深系统策划(成都)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'MIG06-智能硬件产品经理(人机交互方向)(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': '20772-企鹅影视天机工作室制片人'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'MIG03-移动端测试开发工程师(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hr.tencent.com/position.php?&start=30> (referer: https://hr.tencent.com/position.php?&start=20)
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': '25923-互动娱乐游戏数据库管理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': '27087-腾讯开放平台部投资经理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': '15605-特效设计(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': '20589-海外PM(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': '15575-王者荣耀游戏社区产品经理(成都)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': '23673-数码编辑(北京)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'S2-CDG财务管理(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': 'SNG16-腾讯音乐用户研究工程师(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': '27087-腾讯开放平台大数据投资系统前端开发(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.scraper] ERROR: Error processing {'positionName': '27087-投资孵化中心大数据分析师(深圳)'}
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\PY\TenCent\TenCent\pipelines.py", line 17, in process_item
    self.file.write(content.encode("utf-8")) + ",\n"
AttributeError: 'TencentPipeline' object has no attribute 'file'
2018-05-10 09:11:15 [scrapy.core.engine] INFO: Closing spider (finished)
2018-05-10 09:11:15 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
  File "F:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
TypeError: close_spider() takes 1 positional argument but 2 were given
2018-05-10 09:11:15 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1467,
 'downloader/request_count': 5,
 'downloader/request_method_count/GET': 5,
 'downloader/response_bytes': 16172,
 'downloader/response_count': 5,
 'downloader/response_status_count/200': 5,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 5, 10, 1, 11, 15, 103335),
 'log_count/DEBUG': 6,
 'log_count/ERROR': 41,
 'log_count/INFO': 7,
 'request_depth_max': 3,
 'response_received_count': 5,
 'scheduler/dequeued': 4,
 'scheduler/dequeued/memory': 4,
 'scheduler/enqueued': 4,
 'scheduler/enqueued/memory': 4,
 'start_time': datetime.datetime(2018, 5, 10, 1, 11, 13, 535246)}
2018-05-10 09:11:15 [scrapy.core.engine] INFO: Spider closed (finished)

最后修改w+ 为 wb 就成功了具体原理 尚不大懂

pipelines修改为

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import json
import codecs

class TencentPipeline(object):

    # def _init_(self):
    #     self.file = codecs.open("tencent.json","w+",encoding='utf-8')
    #
    # def process_item(self, item, spider):
    #     lines = json.dumps(dict(item), ensure_ascii=False) + '\n'
    #     self.file.write(lines)
    #     self.file.flush()
    #     return item
    #
    #
    # def spider_closed(self, spider):
    #     self.file.close()
    def open_spider(self, spider):
        self.file = codecs.open('items.json', 'wb',encoding='utf-8')

    def close_spider(self, spider):
        self.file.close()

    def process_item(self, item, spider):
        line = json.dumps(dict(item), ensure_ascii=False) + "\n"
        self.file.write(line)
        return item
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 211,817评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,329评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,354评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,498评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,600评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,829评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,979评论 3 408
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,722评论 0 266
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,189评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,519评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,654评论 1 340
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,329评论 4 330
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,940评论 3 313
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,762评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,993评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,382评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,543评论 2 349

推荐阅读更多精彩内容