1.robots协议
网站告知哪些网页可以爬取,哪些不可以
1.1例:https://www.jd.com/robots.txt
结果
User-agent: *
Disallow: /?*
Disallow: /pop/.html
Disallow: /pinpai/.html?*
User-agent: EtaoSpider
Disallow: /
User-agent: HuihuiSpider
Disallow: /
User-agent: GwdangSpider
Disallow: /
User-agent: WochachaSpider
Disallow: /
基本语法