Python第二天学习精简要闻——初等爬虫

获取到网页的源代码，然后提取我们要的信息

requets库 xpath语法
安装库 pip install

import  requests
response=requests.get('https://www.baidu.com/')
print(response.status_code)
response.encoding='utf-8'
# 设置编码方式
print(response.status_code)
print(response.encoding)#相应的编码方式

响应代码200相应成功 ok 404 no found
获取string类型的相应

html_date=response.text
print(html_date)

讲爬取到的文件写成本地HTML文档
文件路径，读写模式，编码方式

with open('index.html','w',encoding='utf-8')as f:
    f.write(html_date)

图片爬取

URL="http://58pic.ooopic.com/58pic/12/81/77/67V58PICtyp.jpg"#地址
response2=requests.get(URL)

获取byte类型的响应

img_date= response2.content

文件路径，读写模式writebinary，编码方式

with open('海洋.jpg','wb')as f:
    if response2.status_code==200:
        f.write(img_date)