for each in response.json['顶层名称']【中间根据json层数决定】[‘数据层名称’]
例如:json格式
{"code":1,
"msg":"操作成功",
"data":
{"pageNo":1,
"hasNext":true,
"list": [{"docid":"DRQQ35F90511ELD5","boardid":"dy_wemedia_bbs","postid":null,"topicid":null,"recommendtids":null,"userid":null,"nickname":null,"userinfo":null,"title":"海湾被鲜血染成血红色:100多只海豚和鲸鱼惨遭法罗群岛渔民斩杀",}]
}
}
代码:
for each in response.json['data']['list']
pyspider传参数
我这边没有利用save传参数
def on_start(self):
self.crawl('http://www.example.org/',
callback=self.callback, save={'a': 123})
def callback(self, response):
return response.save['a']
直接利用上一步爬取的参数,然后回调参数获取
def index_page(self, response):
for each in response.json['data']['list']:
docid=each['docid']
title=each['title']
imgsrc=each['imgsrc']
self.crawl('http://www.***.com/***',callback=self.detail_page)
@config(priority=2)
def detail_page(self, response):
imgsrc=response.save['imgsrc']
content=response.doc('#content').html()
return {
"content":content,
"title": response.doc('h2').text(),
"imgsrc":imgsrc
}
这样就可以利用上一步的参数了