[参考链接]https://juejin.cn/post/7000401392007380999#heading-14

线性回归

线性回归是一种用于建立和预测变量之间线性关系的统计学方法。它是回归分析中最简单且广泛应用的方法之一。线性回归的目标是通过一个线性方程来拟合数据，以找到自变量（特征）和因变量（目标）之间的关系。在训练线性回归模型时，目标是通过最小化预测值与真实值之间的误差（通常使用最小二乘法）

实战代码：

本次使用的数据集是CCPP，包含47840条数据，每个数据有四个特征分别是：变量温度AT、环境压力 (AP)、相对湿度 (RH) 和排气真空 (V)，用于预测工厂每小时的净电能输出 (EP)。

import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

all_sheets = pd.read_excel('./dataset/Folds5x2_pp.xlsx', sheet_name=None)
type(all_sheets['Sheet1'])

combined_df = pd.concat(all_sheets.values(), ignore_index=True)
combined_df.shape
combined_df.head()

image.png

# 取出前四个特征字段
X = combined_df.iloc[:,:-1]
X.shape
X.head()

image.png

y = combined_df.iloc[:,-1]# shape将是（47840，）一维的
y = combined_df[["PE"]] # (47840, 1)
y.shape

# 数据集的划分
from sklearn.model_selection import train_test_split
# 3,1开
X_train,X_test, y_train, y_test = train_test_split(X, y, random_state=2023)
X_train.shape # (35880, 4)

# 实例化回归器并拟合训练数据
liner = LinearRegression()
liner.fit(X_train,y_train)
print(liner.intercept_) # 截距常量
print(liner.coef_) # 系数常量
# [453.90206833]
# [[-1.97312956 -0.23561418  0.06277889 -0.15813581]]

y_pred = liner.predict(X_test) # 预测测试集数据
y_pred.shape 

# 使用均方误差MSE评价回归器拟合结果
from sklearn import metrics

print("MSE：",metrics.mean_squared_error(y_test,y_pred)) 
liner = LinearRegression()
# 尝试5折交叉验证


from sklearn.model_selection import cross_val_predict

y_predicted = cross_val_predict(liner,X,y,cv=5)
y_predicted
# 交叉验证会使损失变小
print("MSE：",metrics.mean_squared_error(y,y_predicted))

结果可视化

fig, ax = plt.subplots()

ax.scatter(y, y_predicted)

ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=3)

ax.set_xlabel('Measured')  # 真实值
ax.set_ylabel('Predicted') # 预测值

plt.show()

image.png

埋坑线性回归

参考链接

Liner-Regression

Liner-Regression

线性回归

实战代码：

本次使用的数据集是CCPP，包含47840条数据，每个数据有四个特征分别是：变量温度AT、环境压力 (AP)、相对湿度 (RH) 和排气真空 (V)，用于预测工厂每小时的净电能输出 (EP)。

结果可视化

埋坑线性回归

推荐阅读更多精彩内容