Coursera网课，斯坦福机器学习 machine-learning 作业1解析

编程实现单一变量的线性回归cost function 和 gradient descent

cost function

元数据含有两列，一列为x（变量），一列为y(值)：

文本数据，列1为x，列2为y

线性回归cost function公式如上，输入参数有三：

X --- dimension 为 m * 2的矩阵，m为数据样本总数即行数，第一列数据全为1，意即在θ0后乘以了一个X0=1.
此处变量只有一个。

X如图：

X
y --- dimension为m*1的矩阵，存储着对应每个x值的真实y值
y如图：

y

【X和y是从上面导入的元数据生成的】

theta --- dimension为2*1的矩阵，存储着猜想中的两个θ的值，初始值可设为[0 ; 0]。

Cost function的实现如下：

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

predic = X * theta;          %all the predicted y
dif = predic - y;            %dif is the difference between predicted y and real y
dif = dif.^2;                % pow2 of dif
s = sum(dif);                % sum of pow2 of dif
J = s/(2*m);                 % divided by 2m

% =========================================================================

end

根据以上公式，我们自然可以用循环的方法计算全部的h(x)-y再累加，但是用矩阵计算速度更快；

矩阵计算原理如下：
X [m * 2]
y [m * 1]
theta [2 * 1]

矩阵X * theta 得到的是形如m*1的矩阵（向量），其数据为根据此theta和每一个变量计算出来的每一个y的预测值；
再减去存有真实结果的y矩阵（向量），得到如下矩阵：
[ h(x1) - y1;
h(x2) - y2;
...
h(xm) - ym]
根据公式，我们需要将此矩阵（向量）取二次方，写作dif = dif.^2;
然后再计算此矩阵(向量)的累加值，写作s = sum(dif);
最后再根据公式写作J = s/(2*m);
此处的J就是由给出的theta得到的cost的值
X和y在后续不会有所改变，但我们会逐步改变theta的值，以求得最小的cost来确定最佳theta

Gradient descent的实现

Gradient descent公式，用于更新theta值

这里我们的theta（系数）值只有两个，是可以用index指定来计算以上公式中最后那个Xj的,
比如类似以下这位同学的解法，用到了X(:,2)来取X的第二列（即Xj），又用了theta(1)和theta(2)这样的写法来一个一个单独求解theta(系数). 但是假如我们有很多个系数需要求，这样的法子显然就不行了。

通过index获取Xj

所以，我们还是倾向于用矩阵计算来解题：

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %

    predic = X * theta;
    dif = predic - y; 
    dif = dif.* X;
    s = sum(dif);
    gradient = alpha*s/m;
    theta = theta - gradient';

    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end

end

第一步，X * theta得到一个形如m * 1的矩阵，存放所有的预测值；
第二步，使其减去真实值，得 m * 1的预测值和真实值差值矩阵，写作dif = predic - y;
第三步是关键，重新看一下线性公式和更新theta值所用的Gradient descent公式：

线性公式

Gradient descent公式，用于更新theta值

根据线性公式和gradient descent公式，我们对每一个系数θ的更新，需要求每一个（行）样本的差值*此θ对应的每一个（行）X的值；

对于θ1来说，差值h(x1)-y1需要乘以X1，差值h(x2)-y2需要乘以X2 ...
但对于θ0来说，差值需要乘以的其实是常数1（因为在线性公式里θ0背后没有乘以变量）

意即，这里所见的gradient descent公式，是要套用到所有θ上面的。
而既然我们已经把所有的θ值（在这里只有两个）存在了一个2 * 1的矩阵（向量）里面了，那我们就可以直接进行矩阵操作：
dif = dif.* X;
注意这里的乘法前面有一个点，这个点的存在使得这一步操作并非是m * 1的差值矩阵乘以 m * 2的X （两矩阵若想相乘必须保证前矩阵的列数等于后矩阵的行数，这俩一个是1一个是m显然不能相乘）；
这个点的用法请参考 matlab dot 方法
作用是，用前矩阵的每一个元素去乘后矩阵的每一个元素，所以要保证两个矩阵形状一致，
但是当其中一个矩阵实为向量时，只需要保证向量的行和第二个矩阵的行数一样就可，
达成的效果是，用向量的每一行数据去乘矩阵的每一行中的每一个数据。
例如：

矩阵点乘法

所以dif = dif.* X;操作得到的新矩阵是m * 2的矩阵，
其第一行为[dif1, dif1 * X1]，第二行为[dif2, dif2 * X2]...直到[difm, difm * Xm]
如果我们竖着求每一列的sum，那得到的不就每一个θ在Gradient descent公式里面的求和部分了嘛！（这里有两列，对应两个θ）
这里用sum()方法对矩阵求和，s = sum(dif)刚好会得到一个1*2的矩阵，第一个数据是第一列的和，第二个数据是第二列的和；
所以第四步，再根据公式补全求和后需要乘以的alpha和1/m，写作gradient = alpha*s/m;
第五步，直接用2 * 1 的theta矩阵减去这个2 * 1 的gradient矩阵，就等于对每一个θ进行了θ = θ - gradient操作了。

[Octave] 线性回归的cost function 和 gradient descent实现

[Octave] 线性回归的cost function 和 gradient descent实现

编程实现单一变量的线性回归cost function 和 gradient descent

Cost function的实现如下：

Gradient descent的实现