在命令行中切换到我们项目的根目录,输入下面命令让我们的爬虫运行:
scrapy crawl quotes
此命令运行的是我们创建名为quotes的爬虫,我们的爬虫将会向quotes.toscrape.com网站提出访问请求,而你将会在命令行中得到类似这样的的返回响应:
...(此处省略了部分内容)
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Spider opened
2016-12-16 21:24:05 [scrapy.extensions.logstats] INFO: Crawled 0 page (at 0 page/min), scraped 0 itmes (at 0 items/min)
2016-12-16 21:24:05 [scrapy.extensions.telnet] DEBUG: Telnet console listenting on 27.
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt>(referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/>(referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/>(referer: None)
2016-12-16 21:24:05 [quotes] DEBUG: Save file quotes-1.html
2016-12-16 21:24:05 [quotes] DEBUG: Save file quotes-2.html
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Closing Spider(finished)
现在查看当前目录,你将会发现两个新文件:quotes-1.html 和 quotes-2.html,后期我们将会解析两文件保存了各自链接中的HTML内容。
注意:我们将会在接下来的内容中讲解如何解析HTML网页。