MML(skl)——C2

Regression problems goal: to predict the value of a continuous response/dependent variable

Steps: training data, model, learning algorithm, and evaluation metrics

Theoretical Part

Linear Regression

Data

training data

Training instance Diameter (in inches) Price (in dollars)
1 6 7
2 8 9
3 10 13
4 14 17.5
5 8 18
sample size x y
Visualize via matplotlib
>>> import matplotlib.pyplot as plt
>>> X = [[6], [8], [10], [14], [18]]
>>> y = [[7], [9], [13], [17.5], [18]]
>>> plt.figure()
>>> plt.title('Pizza price plotted against diameter')
>>> plt.xlabel('Diameter in inches')
>>> plt.ylabel('Price in dollars')
>>> plt.plot(X, y, 'k.')
>>> plt.axis([0, 25, 0, 25])
>>> plt.grid(True)
>>> plt.show()
Model fitting
>>> from sklearn.linear_model import LinearRegression
>>> # Training data
>>> X = [[6], [8], [10], [14], [18]]
>>> y = [[7], [9], [13], [17.5], [18]]
>>> # Create and fit the model
>>> model = LinearRegression()
>>> model.fit(X, y)
>>> print('A 12" pizza should cost: $%.2f' % model.predict([12])[0])
A 12" pizza should cost: $13.68

The sklearn.linear_model.LinearRegression class is an estimator.
Estimators predict a value based on the observed data. In scikit-learn, all estimators implement the fit() and predict() methods.

Comparison
import numpy as np
m, n = (10000, 10000)
xs = np.linspace(0, 20, m)
plt.figure()
plt.plot(xs,pizzamodel.predict(xs.reshape(-1,1)),'r')
plt.plot(X,y,'bo',markersize=10)
plt.grid(1)
plt.title('predicted vs. sample')
plt.show()
LinearR_comparison.png
Evaluation of model fitness

some definitions:
cost function/loss function := define and measure the error of a model
residuals or training errors := the difference between predicted value and training data y value
prediction errors or test errors := the difference between predicted value and test data y value

some defs for linear regression
residual sum of squares cost function
LSE: least square estimators

when we have a cost function, we can find the values of our model's parameters
that minimize it.

Note, unbiased estimator for a variance of a dataset should have N-1 instead of N as the denominator

Evaluation

test data

Test instance Diameter (in inches) Observed Price (in dollars) Predicted price (in dollars)
1 8 11 7759
2 9 8.5 10.7522
3 11 15 12.7048
4 16 18 17.5863
5 12 11 13.6811
sample size x y y_predicted

Several measures can be used to assess our model's predictive capabilities. We will
evaluate our pizza-price predictor using r-squared
r^2 = 1, no errors
r^2 = .5, half of the variance in the response variable can be predicted using the model
In the case of simple linear regression, r-squared is equal to the square of the Pearson product moment correlation coefficient, or Pearson's r.

R^2 := 1 - \frac{cost\ function\ :=\ sum\ of\ square\ of\ residual}{total\ sum\ of\ squares}

>>> from sklearn.linear_model import LinearRegression
>>> X = [[6], [8], [10], [14], [18]]
>>> y = [[7], [9], [13], [17.5], [18]]
>>> X_test = [[8], [9], [11], [16], [12]]
>>> y_test = [[11], [8.5], [15], [18], [11]]
>>> model = LinearRegression()
>>> model.fit(X, y)
>>> print('R-squared: %.4f' % model.score(X_test, y_test))
R-squared: 0.6620

Multiple Linear Regression

for Y = \beta \cdot X
solution: \beta = (X^TX)^{-1}XY

so a package to solve matrix inverse calculation is introduced: np.linalg (linear algebra)

>>> from numpy.linalg import inv
>>> from numpy import dot, transpose
 dot(inv(dot(transpose(X), X)), dot(transpose(X), y))

a least squares function from numpy: np.linalg.lstsq

Polynomial regression

Quadratic regression

y = \alpha + \beta_1 x + \beta_2 x^2
e.g.

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.preprocessing import PolynomialFeatures
>>> X_train = [[6], [8], [10], [14], [18]]
>>> y_train = [[7], [9], [13], [17.5], [18]]
>>> X_test = [[6], [8], [11], [16]]
>>> y_test = [[8], [12], [15], [18]]
>>> regressor = LinearRegression()
>>> regressor.fit(X_train, y_train)
>>> xx = np.linspace(0, 26, 100)
>>> yy = regressor.predict(xx.reshape(xx.shape[0], 1))
>>> plt.plot(xx, yy)
1.png

note here's the different part

>>> quadratic_featurizer = PolynomialFeatures(degree=2)
>>> X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
>>> X_test_quadratic = quadratic_featurizer.transform(X_test)
"""X_train_quadratic:
array([[  1.,   6.,  36.],
       [  1.,   8.,  64.],
       [  1.,  10., 100.],
       [  1.,  14., 196.],
       [  1.,  18., 324.]])"""
>>> regressor_quadratic = LinearRegression()
>>> regressor_quadratic.fit(X_train_quadratic, y_train)
>>> xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))

PolynomialFeatures(degree=N).fit_transform(x) : x-> 1, x, x^2, ..., x^N
MAIN POINT is to transform x into N multiple variables, still use LinearRegression()

>>> plt.plot(xx, regressor_quadratic.predict(xx_quadratic), c='r',
linestyle='--')
2.png
>>> plt.title('Pizza price regressed on diameter')
>>> plt.xlabel('Diameter in inches')
>>> plt.ylabel('Price in dollars')
>>> plt.axis([0, 25, 0, 25])
>>> plt.grid(True)
>>> plt.scatter(X_train, y_train)
>>> plt.show()
>>> print(X_train)
>>> print(X_train_quadratic)
>>> print(X_test)
>>> print(X_test_quadratic)
>>> print('Simple linear regression r-squared', regressor.score(X_test, y_test))
>>> print('Quadratic regression r-squared', regressor_quadratic.score(X_test_quadratic, y_test))
3.png

R^2 increases to .87

When degree = 9, R^2 = -.09
which is over-fitting

Regulation

Regularization is a collection of techniques that can be used to prevent over-fitting.
Regularization adds information to a problem, often in the form of a penalty against complexity, to a problem.

Occam's razor : a hypothesis with the fewest assumptions is the best

Ridge regression (Tikhonov regularization)(L2):
RSS_{ridge} = \sum_{i=1}^n (y - x_i^T)^2 +\lambda \sum_{j=1}^p \beta_j^2

\lambda:Hyperparameters, which are parameters of the model that are not learned automatically and must be set manually

Least Absolute Shrinkage and Selection Operator (LASSO).(L1):
RSS_{ridge} = \sum_{i=1}^n (y - x_i^T)^2 +\lambda \sum_{j=1}^p \beta_j
NOTE: The LASSO produces sparse parameters; most of the coefficients will become zero, and the model will depend on a small subset of the features, while ridge most nonzero.
When explanatory variables are correlated, the LASSO will shrink the coefficients of one variable toward zero. Ridge regression will shrink them more uniformly.

Elastic Net:
RSS_{ridge} = \sum_{i=1}^n (y - x_i^T)^2 +\lambda_2 \sum_{j=1}^p \beta_j^2 +\lambda_1 \sum_{j=1}^p \beta_j

Down To Earth

dataset url: https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

Data Exploring

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data',header=None,names=['Alcohol','Malic_acid ','Ash','Alcalinity_of_ash','Magnesium', 'Total_phenols','Flavanoids','Nonflavanoid_phenols','Proanthocyanins','Color_intensity','Hue','OD280/OD315_of_diluted wines','Proline'])
data.index.name='quality'

plt.figure(figsize=(15,15))

plt.subplot(2,2,1)
plt.title('alcohol vs quality ')
plt.xlabel('alcohol')
plt.ylabel('quality')
plt.scatter(data['Alcohol'], data.index)

plt.subplot(2,2,2)
plt.title('Ash vs quality ')
plt.xlabel('Ash')
plt.ylabel('quality')
plt.scatter(data['Ash'], data.index)

plt.subplot(2,2,3)
plt.title('Proline vs quality ')
plt.xlabel('Proline')
plt.ylabel('quality')
plt.scatter(data['Proline'], data.index)

plt.subplot(2,2,4)
plt.title('Hue vs quality ')
plt.xlabel('Hue')
plt.ylabel('quality')
plt.scatter(data['Hue'], data.index)

plt.show()
4.png

Model Fitting

Hold-out validation

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data',header=None,names=['Alcohol','Malic_acid ','Ash','Alcalinity_of_ash','Magnesium', 'Total_phenols','Flavanoids','Nonflavanoid_phenols','Proanthocyanins','Color_intensity','Hue','OD280/OD315_of_diluted wines','Proline'])
data.index.name='quality'

X = data.loc[:,['Alcohol','Ash','Proline','Hue']]
y = data.index
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state = 40)
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_predictions = regressor.predict(X_test)
print('R-squared:', regressor.score(X_test, y_test))


R-squared: 0.630209361477557
  1. load data
  2. split data set via model_selection.train_test_split
    Note
    i. train_test_split(data, label, stratify=y ,test_size=0.25(by default), random_state=40), and this is hold-out method with random/stratify sampling
    ii. with stratified split the R-squared increases to 0.6554701296431691
  3. train the model and evaluate it on the test set.

Cross validation

cross_validation.cross_val_score(classifier,data,target,cv=5) ,when cv is a nunmber k -> k-fold

GD

reasons: decrease computational complexity; matrix may not be inverted
Gradient Descent is an optimization algorithm that can be used to estimate the local minimum of a function. Fortunately, the residual sum of the squares cost function is convex.

to minimize SS_{res} = \sum_{i=1}^n (y - f(x_i))^2
learning rate: too big -> hang around the bottom; too small -> taking too long time

(Batch) gradient descent

uses all of the training instances to update the model parameters in each iteration

Stochastic Gradient Descent (SGD)

updates the parameters using only a single training instance in each iteration. The training instance is usually selected randomly.

import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
data = load_boston()
X_train, X_test, y_train, y_test = train_test_split(data.data,data.target)
X_scaler = StandardScaler()
y_scaler = StandardScaler()
X_train = X_scaler.fit_transform(X_train)
y_train = y_scaler.fit_transform(y_train.reshape(-1,1))
X_test = X_scaler.transform(X_test)
y_test = y_scaler.transform(y_test.reshape(-1,1))
regressor = SGDRegressor(loss = 'squared_loss')
scores = cross_val_score(regressor, X_train, y_train, cv = 5)
print('Cross validation r-squared scores:', scores)
print('Average cross validation r-squared score:', np.mean(scores))
regressor.fit_transform(X_train, y_train)
print('Test set r-squared score', regressor.score(X_test, y_test))

Cross validation r-squared scores: [0.59439483 0.613529   0.72415499 0.78472194 0.69196096]
Average cross validation r-squared score: 0.6817523439301019

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,928评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,192评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,468评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,186评论 1 286
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,295评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,374评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,403评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,186评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,610评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,906评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,075评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,755评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,393评论 3 320
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,079评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,313评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,934评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,963评论 2 351

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,319评论 0 10
  • 焦点工作坊第四期持续分享第23天(2018521)罗秋红 今日小满,刚好看到一篇文章是人生最妙是小满。又偷懒了,摘...
    陌上花开huanhuangui阅读 181评论 0 0
  • 《如何说孩子才会听 怎么听孩子才肯说》读书摘要 一般来说,孩子怯懦性格,除先天原因之外,其形成主要...
    李春红_7fb4阅读 281评论 0 0
  • 离开家才会发现之前所向往的地方远没有家来的温暖。
    muggles小白阅读 144评论 0 0
  • 初一开学第一天,英语老师来一次突击听写。听写之后,老师在家长群上讲此次听写都是小学英语单词,但是效果很不理想,就不...
    小懶同學阅读 277评论 0 0