web scraper爬虫1

一、配置web scraper
从Chrome浏览的扩展商店中安装web scraper;安装过程不做赘述;
安装完成后,在浏览器页面按F12打开console模式,点击web scraper进行操作。

二、内容抓取简单操作
1.循环多个相同页面内容抓取
可以使用正规则表达式,循环抓取指定页面,如[x-y]
2.表格按行显示
开启首列内容为"multiple"的设置为true,其他列的"multiple"为false;
3.抓取子页面内容的元素
设置link,并以该元素为父节点。

简单案例:
{"selectors":[{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"link","selector":"td.table-com-name a","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"name","selector":"td.table-com-name a","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"date","selector":"td.table-time","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"jieduan","selector":"td.table-stage a","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"lingyu","selector":"td.table-type a","regex":"","delay":""}],"startUrl":"http://www.cyzone.cn/index.php?c=index&a=init&tpl=dbsearch&wq=%E5%86%9C%E6%9D%91&modelid=18&page=[1-9]","_id":"nongcun2"}

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容