sklearn的基本教程
https://www.jianshu.com/p/6ada34655862
sklearn官网
https://scikit-learn.org/stable/
在Python中,可以使用Numpy和sklearn包来实现线性拟合。
安装Anaconda后
下载sklearn和numpy包,打开cmd
pip install numpy
pip install sklearn
首先准备好房价和面积的数据集数据集
左边是面积,右边是价格
1000,168
792,184
1260,197
1262,220
1240,228
1170,248
1230,305
1255,256
1194,240
1450,230
1481,202
1475,220
1482,232
1484,460
1512,320
1680,340
1620,240
1720,368
1800,280
4400,710
4212,552
3920,580
3212,585
3151,590
3100,560
2700,285
2612,292
2705,482
2570,462
2442,352
2387,440
2292,462
2308,325
2252,298
2202,352
2157,403
2140,308
4000,795
4200,765
3900,705
3544,420
2980,402
4355,762
3150,392
3025,320
3450,350
4402,820
3454,425
890,272
下载成功后,打开anaconda下面的Spyder
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
datasets_X = []
datasets_Y = []
#导入数据
f = open('prices.txt','r')
lines = f.readlines()#读取全部数据
f.close()
#
for line in lines:
items = line.strip().split(',')
datasets_X.append(int(items[0]))#添加面积
datasets_Y.append(int(items[1]))#添加房价
n = len(datasets_X) # 样本数目
#将面积X转为n*1的二维数组,将房价Y转为数组
datasets_X = np.array(datasets_X).reshape([n,1])
datasets_Y = np.array(datasets_Y)
print(datasets_X,datasets_Y)
#找到最小和最大的面积
minX = min(datasets_X)
maxX = max(datasets_X)
#以数据datasets_X的最大值和最小值为范围,建立等差数列,方便后续画图
X = np.arange(minX,maxX).reshape([-1,1])
#print(minX,maxX,X)
#条用线性回归模型,建立回归方程
linear = linear_model.LinearRegression()
#拟合数据
linear.fit(datasets_X,datasets_Y)
#print('系数',linear.coef_)
#print('截距',linear.intercept_)
#绘制数据点
plt.scatter(datasets_X,datasets_Y,color='red')
#plot函数绘制归回线
plt.plot(X,linear.predict(X),color = 'blue')
#横轴和纵轴的标签
plt.xlabel('Area')
plt.ylabel('Price')
plt.show()
结果: