单变量线性回归(一)

模型表达(Model Representation)

我们通过一个例子作为开始,这个例子就是我们之前的预测房价的例子。我们要使用一个数据集,数据集包含某地的住房价格。假设你有一朋友正想出售自己面积为1250平方英尺的房子,你要告诉他这房子可以卖多少钱。这时你就需构建一个模型,从这个数据集来看这或许是条直线,从图中的原谅绿你可以对你朋友说这房子可以卖22万美元左右。

房屋价格数据集

这就是监督学习的一个例子,其这所以称之为监督学习是因为对于每个数据而言,我们预先给出了“正确答案”,即告诉我们根基我们所拥有的数据,我们可以得出房屋的实际价格。

对于这个例子更为具体而言。这是一个回归问题,回归一词指的是我们根据已有的数据预测出一个正确的输出值。更进一步来说,在监督学习中我们都会拥有一个数据集,这个数据集称为训练集。我们将在以后用如下标记描述回归问题:

  • m:number of training examples
  • x:"input" variable / features
  • y:"output" variable / "target" variable

除此之外,我们使用(x, y)来表示一个训练样本(训练样本:训练过程中使用的数据称为“训练数据”,其中每一个样本称为“训练样本”。),其中我们用(x(i), y(i))来表示某一个训练样本。

我们通常使用h(hypothesis)表示一个函数。在预测房价的例子中,x对应房屋面积,y对应房屋价格,根据监督学习我们可以构建一个函数表达式h:
  hθ(x) = θ0 + θ1x

我们将这种可以构建出线性函数的模型称为线性回归模型。在这里只有一个特征变量x,因此我们称这种单一变量模型为单变量线性回归。

补充笔记
Model Representation

To establish notation for future use, we'll use x(i) to denote the "input" variables (living area in this example), also called input features, and y(i) to denote the "output" or target variable that we are trying to predict (price). A pair (x(i), y(i)) is called a training example, and the dataset that we'll be using to learn -- a list of m training examples (x(i), y(i)); i = 1, ... , m -- is called a training set. Note that the superscript "(i)" in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = R (Real Number ).

To describle the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h: X -> Y so that h(x) is a "good" predictor for the corresponding value of y. For histirical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:

When the target variable that we're trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on a small number of discrete of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.

代价函数(Cost Function)

在这之前我们介绍了如下这个表达式:
  hθ(x) = θ0 + θ1x

表达式中的θ0和 θ1这些参数,我们将其(θi)称为模型参数。我们选择不同的参数值,将得到不同的假设函数。因此,在我们预测房价的例子中,我们通过选择参数值尽可能正确地预测y的值。若我们想要取得θ0和 θ1的值来使得h(x)与y之间的差最小化,我们要做的就是尽量减少假设函数的输出值与房子真实价格之间的差的平方,即使得代价函数最小,用数学表达式可表示为:

补充笔记
Cost Function

We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.

Cost Function - Intuition Ⅰ

If we try to think of it in visual terms, our training data set is scattered on the x-y plane. We are trying to make a straight line (defined by hθ(x)) which passes through these scattered data points.

Our objective is to get the best possible line. The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least. Ideally, the line should pass though all the points of our training data set. In such a case, the value of J(θ0, θ1) will be 0. The following example shows the ideal situation where we have a cost function of 0.

When θ1 = 1, we get a slope of 1 which goes which goes through every single data point in our model. Conversely, when θ1 = 0.5, we see the vertical distance from our fit to the data points increase.

This increase our cost function to 0.58. Plotting several other points yields to the following graph:

Thus as a goal, we should try to minimize the cost function. In this case, θ1 = 1 is our global minimum.

Cost Function - Intuition Ⅱ

A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the cost function. For example, the three green points found on the green line above have the same value for J(θ0, θ1) and as a result, they are found along the same line. The circled x displays the value of the cost function for the graph on the left when θ0 = 800 and θ1 = -0.15. Taking another h(x) and plotting its contour plot, one gets the following graphs:

When θ0 = 360 and θ1 = 0, the value of J(θ0, θ1) in the contour plot gets closer to the center thus reducing the cost function error. Now giving our hypothesis function a slightly positive slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the result of θ1 and θ0 tend to be around 0.12 and 250 respectively. Plotting those values on our graph to the right seems to put our point in the center of the inner most 'circle'.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,717评论 6 496
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,501评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,311评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,417评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,500评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,538评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,557评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,310评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,759评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,065评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,233评论 1 343
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,909评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,548评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,172评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,420评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,103评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,098评论 2 352

推荐阅读更多精彩内容