爬虫是一件很有趣的事情，就像黑客一样，和服务器对抗，服务器维护代码防止爬虫，我们通过各种bug解决爬虫困难，真实有趣！今天给大家带来一串代码，针对网页中的表格爬取，表格页码变化，而链接url不变的情况1

成功利用python爬取网页html表格，链接不变，页码变，nice

首先给出爬取的原文链接：https://d.qianzhan.com/yuanqu/

接下来一步一步解释一下代码：

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#导入所需库文件
import requests
import pandas as pd
import csv
</pre>

代码核心

上面三个库文件是本次爬虫所需的主要库，requests库向网站发送请求，请求成功则返回响应状态码为200，否则为400或者404，pandas库可以用来直接抓取表格数据，csv库用来将文件保存在本地csv文件内。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#这里爬取的表格为广东省的数据表格
areas = ['广东省']

写入链接url

urls = ['https://d.qianzhan.com/yuanqu/?p={}'.format(str)for str in areas]
x = 0
for url in urls:

打开csv文件，在第一行写入标题

with open('前瞻.csv', 'a', newline='', encoding='utf-8-sig') as csvfile:
w = csv.writer(csvfile)

写入标题或称之为变量

w.writerow(['序号', '园区名称', '省份', '城市', '占地面积', '企业数', '详情'])

对表格的全部页数进行抓取

for i in range(1, 33): # 跳页

在网页源代码找到页码的属性，这里页码在属性page上

data = {
'page': i
}

对网站发送请求，获得响应

s = requests.session()
d = s.get(url, params=data)

抓取html表格

tb = pd.read_html(d.text)[1]

将抓取来的表格填入csv文件内

tb.to_csv(r'前瞻.csv', mode='a', encoding='utf_8_sig', header=0, index=0)

每抓取玩一个省份，打印出抓取结果，给用户更好体验

print(areas[x], "已经抓取完成！请到本地文档中查看抓取结果！")
x = x+1

全部抓取完成，则打印'全部抓取完毕!'

print('全部抓取完毕!')
</pre>

成功利用python爬取网页html表格，链接不变，页码变，nice

结果分析

最终成功抓取到所有表格的内容，并存入了csv文件内，大家有问题可以下方留言讨论！

成功利用python爬取网页html表格，链接不变，页码变，nice

接下来一步一步解释一下代码：

代码核心

写入链接url

打开csv文件，在第一行写入标题

写入标题或称之为变量

对表格的全部页数进行抓取

在网页源代码找到页码的属性，这里页码在属性page上

对网站发送请求，获得响应

抓取html表格

将抓取来的表格填入csv文件内

每抓取玩一个省份，打印出抓取结果，给用户更好体验

全部抓取完成，则打印'全部抓取完毕!'

结果分析

推荐阅读更多精彩内容