实作网址:
http://www.dianping.com/shop/93729095/review_all/p1
抓取大众点评的店铺评论时遇到了【展开/收起评论】的点击键,无法用type/text直接抓取,如图:
考虑使用element click,多页面抓取的话继续套用,但值得庆幸的是,评论作为规律分页省去套用的麻烦尝试。
Import
{"_id":"zhankai2","startUrl":["[http://www.dianping.com/shop/93729095/review_all/p1-3]"],"selectors":[{"id":"111","type":"SelectorElementClick","selector":"div.main-review","parentSelectors":["_root"],"multiple":true,"delay":"3000","clickElementSelector":"div.more-words a.fold","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"333","type":"SelectorText","selector":"div.review-words","parentSelectors":["111"],"multiple":false,"regex":"","delay":0}]}
当然,唯一不方便的是,需要点击店铺详情页的更多评价进入评论页面(但是,似乎点进去之后才是scraper的优势页面,这就有点尴尬了)……
可是,再后来的测试中,发现有的店铺的两种评论(有展开评论按钮的,以及啥都没有的)无法全部选中,且抓取结果中有重复项,故对其进行优化。
Import2
{"_id":"blue","startUrl":["[http://www.dianping.com/shop/96127598/review_all/p1-3]"],"selectors":[{"id":"111","type":"SelectorElementClick","selector":"div.reviews-items > ul > li","parentSelectors":["_root"],"multiple":true,"delay":"3000","clickElementSelector":"div.more-words a.fold","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"222","type":"SelectorElement","selector":"div.review-words:nth-of-type(n+3)","parentSelectors":["_root"],"multiple":true,"delay":"3000"},{"id":"333","type":"SelectorText","selector":"parent","parentSelectors":["222"],"multiple":false,"regex":"","delay":0}]}