最小二乘法线性回归:sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False,copy_X=True, n_jobs=1)
主要参数说明:
fit_intercept:布尔型,默认为True,若参数值为True时,代表训练模型需要加一个截距项;若参数为False时,代表模型无需加截距项。
normalize:布尔型,默认为False,若fit_intercept参数设置False时,normalize参数无需设置;若normalize设置为True时,则输入的样本数据将(X-X均值)/||X||;若设置normalize=False时,在训练模型前, 可以使用sklearn.preprocessing.StandardScaler进行标准化处理。
属性:
coef_:回归系数(斜率)
intercept_:截距项
主要方法:
①fit(X, y, sample_weight=None)
②predict(X)
③score(X, y, sample_weight=None),其结果等于1-(((y_true - y_pred) **2).sum() / ((y_true - y_true.mean()) ** 2).sum())
利用sklearn自带的糖尿病数据集,建立最简单的一元回归模型
In [1]:importnumpyasnp
...:fromsklearnimportdatasets , linear_model
...:fromsklearn.metricsimportmean_squared_error , r2_score
...:fromsklearn.model_selectionimporttrain_test_split
...:#加载糖尿病数据集
...: diabetes = datasets.load_diabetes()
...: X = diabetes.data[:,np.newaxis ,2]#diabetes.data[:,2].reshape(diabetes
...: .data[:,2].size,1)
...: y = diabetes.target
...: X_train , X_test , y_train ,y_test = train_test_split(X,y,test_size=0.2
...: ,random_state=42)
...: LR = linear_model.LinearRegression()
...: LR.fit(X_train,y_train)
...: print('intercept_:%.3f'% LR.intercept_)
...: print('coef_:%.3f'% LR.coef_)
...: print('Mean squared error: %.3f'% mean_squared_error(y_test,LR.predict
...: (X_test)))##((y_test-LR.predict(X_test))**2).mean()
...: print('Variance score: %.3f'% r2_score(y_test,LR.predict(X_test)))#1-(
...: (y_test-LR.predict(X_test))**2).sum()/((y_test - y_test.mean())**2).sum
...: ()
...: print('score: %.3f'% LR.score(X_test,y_test))
...: plt.scatter(X_test , y_test ,color ='green')
...: plt.plot(X_test ,LR.predict(X_test) ,color='red',linewidth =3)
...: plt.show()
...:
intercept_:152.003
coef_:998.578
Mean squared error:4061.826
Variance score:0.233
score:0.233
效果如下: