0.1 数据采集

*请以上市公司久其软件（股票代码002279）为例，根据网页数据：https://money.finance.sina.com.cn/corp/go.php/vFD_ProfitStatement/stockid/002279/ctrl/2022/displaytype/4.phtml，编写Python程序实现抓取，将抓取结果保存为json文件并对采集到的数据进行预处理，并完成以下需求：
好的，以下是一个使用Python实现抓取和预处理的示例程序：

import requests
import json
from bs4 import BeautifulSoup
 # 发起网络请求获取网页内容
url = "https://money.finance.sina.com.cn/corp/go.php/vFD_ProfitStatement/stockid/002279/ctrl/2022/displaytype/4.phtml"
response = requests.get(url)
html_content = response.content
 # 使用BeautifulSoup解析网页内容
soup = BeautifulSoup(html_content, 'html.parser')
 # 找到数据所在的表格
table = soup.find('table', class_='tab01')
 # 定义一个空的列表来保存抓取到的数据
data = []
 # 遍历表格的每一行
for row in table.find_all('tr'):
    # 定义一个空的字典来保存每一行的数据
    row_data = {}
    # 获取表格中每个单元格的数据
    cells = row.find_all('td')
    if len(cells) == 5:
        row_data['年份'] = cells[0].text.strip()
        row_data['营业收入'] = cells[1].text.strip()
        row_data['营业利润'] = cells[2].text.strip()
        row_data['净利润'] = cells[3].text.strip()
        row_data['每股收益'] = cells[4].text.strip()
        # 将每一行的数据添加到列表中
        data.append(row_data)
 # 将抓取结果保存为json文件
with open('data.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)
 # 对采集到的数据进行预处理
# 这里可以根据需要进行数据清洗、转换等操作
 print("数据抓取和预处理完成！")

请确保您已经安装了 requests 、 beautifulsoup4 等必要的Python库。运行以上代码后，会在当前目录下生成一个名为 data.json 的JSON文件，其中包含了从网页抓取到的数据。您可以根据需要对采集到的数据进行进一步的处理和分析。

0.1 数据采集

推荐阅读更多精彩内容