十一、Scrapy框架–实战–模拟登陆GitHUB
settings.py设置
ROBOTSTXT_OBEY= False
COOKIES_ENABLED= True
DEFAULT_REQUEST_HEADERS= {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1;WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132Safari/537.36'
}
创建start.py文件
from scrapy import cmdline
cmdline.execute("scrapycrawl github".split(" "))
github.py 示例代码
import scrapy
class GithubSpider(scrapy.Spider):
name = 'github'
allowed_domains = ['github.com']
start_urls = ['https://github.com/login']
def parse(self, response):
# authenticity_token = response.xpath("//input[@name='authenticity_token']/@value").get()
# formdata = {
# "commit": "Sign in",
# "utf8": "√",
# "authenticity_token": "authenticity_token",
# "login": "略",
# "password": "略",
# "webauthn-support": "supported"
# }
# yield scrapy.FormRequest("https://githbu.com/session", formdata=formdata,callback=self.after_login)
yield scrapy.FormRequest.from_response(response, formdata={
"login":"略",
"password":"略"
}, callback=self.visit_profile)
def after_login(self, response):
yieldscrapy.Request("https://github.com/settings/profile",callback=self.visit_profile)
def visit_profile(self, response):
with open("github_profile.html",'w', encoding='utf-8') as fp:
fp.write(response.text)
上一篇文章 第六章 Scrapy框架(十) 2020-03-12 地址:
https://www.jianshu.com/p/b4bc58e806f9
下一篇文章 第六章 Scrapy框架(十二) 2020-03-14 地址:
https://www.jianshu.com/p/fc0b7b7fc5c8
以上资料内容来源网络,仅供学习交流,侵删请私信我,谢谢。