作业:
爬取糗事百科段子
地址:https://wwww.qiushibaike.com/text/page/1
示例代码:
import re
import requests
def parse_page(url):
headers = {'User-Agent': 'Mozilla/5.0(Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/79.0.3945.88 Safari/537.36'}
response = requests.get(url, headers)
text = response.text
# re.S = reDOTALL
contents =re.findall(r'.*?(.*?)',text, re.DOTALL)
duanzi = [0]
for content in contents:
x = re.sub(r'<.*?>', ' ',content)
duanzi.append(x.strip())
print(x.strip())
print('='*50)
def main():
url ='https://wwww.qiushibaike.com/text/page/1'
for x in range(1, 10):
url ='https://www.qiushibaike.com/text/page/%s/' % x
parse_page(url)
break
if__name__ == '__main__':
main()
上一篇文章 第三章 数据解析(二十) 2019-12-31 地址:
https://www.jianshu.com/p/1a00d43b2650
下一篇文章 第四章 数据储存(一)——JSON、CSV、Excel、MySQL 2020-01-02 地址:
https://www.jianshu.com/p/8736249cba86
以上资料内容来源网络,仅供学习交流,侵删请私信我,谢谢。