根据前面的知识可以写出一个简单的爬虫,再一步步完善它
# -*- coding: utf-8 -*-
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'quotes'
allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
quotes = reponse.xpath('//*[@class="quote"]')
for quote in quotes:
text = quote.xpath('.//*[@class="text"]/text()').extract_first()
author = quote.xpath('.//*[@itemprop="author"]/text()').extract()
tags = quote.xpath('.//*[@itemprop="keywords"]/@content').extract()
print '\n'
print text
print author
print tags
print '\n'
在爬虫的根目录中输入命令
scrapy crawl quotes