第六章 Scrapy框架（十一） 2020-03-13

十一、Scrapy框架–实战–模拟登陆GitHUB

settings.py设置

ROBOTSTXT_OBEY= False

COOKIES_ENABLED= True

DEFAULT_REQUEST_HEADERS= {

'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

'Accept-Language': 'en',

'User-Agent': 'Mozilla/5.0 (Windows NT 6.1;WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132Safari/537.36'

}

创建start.py文件

from scrapy import cmdline

cmdline.execute("scrapycrawl github".split(" "))

github.py 示例代码

import scrapy

class GithubSpider(scrapy.Spider):

name = 'github'

allowed_domains = ['github.com']

start_urls = ['https://github.com/login']

def parse(self, response):

# authenticity_token = response.xpath("//input[@name='authenticity_token']/@value").get()

# formdata = {

# "commit": "Sign in",

# "utf8": "√",

# "authenticity_token": "authenticity_token",

# "login": "略",

# "password": "略",

# "webauthn-support": "supported"

# }

# yield scrapy.FormRequest("https://githbu.com/session", formdata=formdata,callback=self.after_login)

yield scrapy.FormRequest.from_response(response, formdata={

"login":"略",

"password":"略"

}, callback=self.visit_profile)

def after_login(self, response):

yieldscrapy.Request("https://github.com/settings/profile",callback=self.visit_profile)

def visit_profile(self, response):

with open("github_profile.html",'w', encoding='utf-8') as fp:

fp.write(response.text)

上一篇文章第六章 Scrapy框架（十） 2020-03-12 地址：

https://www.jianshu.com/p/b4bc58e806f9

下一篇文章第六章 Scrapy框架（十二） 2020-03-14 地址：

https://www.jianshu.com/p/fc0b7b7fc5c8

以上资料内容来源网络，仅供学习交流，侵删请私信我，谢谢。

最后编辑于：2020.03.14 21:36:18

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

第六章 Scrapy框架（十一） 2020-03-13

第六章 Scrapy框架（十一） 2020-03-13

十一、Scrapy框架–实战–模拟登陆GitHUB

相关阅读更多精彩内容

友情链接更多精彩内容