模型表示-线性回归

m：训练样本集大小, size of Training Set

x's：输入/特征

y's：输出

(xi，yi)对应一个训练样本i

我们需要一个算法algorithm，根据Training Set，

Hypothesis

$h_{\theta } (x)=\theta _{0} +\theta _{1}x$

Parameters

$\theta_{0}, \theta_{1}$

Cost function

$J(\theta )=\frac{1}{2m} \ast \sum\nolimits_{1}^m[h_{\theta } (x)_{i} -y_{i} ]^2$

平方误差函数，解决回归问题的最常用手段

Goal

minimize cost function

Approach

Gradient Descent 梯度下降法

repeat until convergence {

$\theta_j := \theta_j - \alpha \frac{∂}{∂\theta_j} J(\theta_0,\theta_1)$

}

要同时更新 $\theta_0$ , $\theta_1$

$\alpha$ 是学习速率 Learning Rate

$\alpha$ 太小会导致梯度下降很慢，太大则可能越过局部最小值点，甚至无法收敛

如果函数的导数在其局部最小值处连续，则下降的step也会随着的斜率的变化而不断减小，最终，梯度下降法能够自动收敛到最近的局部最小值点，而不必调小 $\alpha$

Multiple Feature Linear Regression

repeat until convergence {

$\theta_j := \theta_j - \alpha \frac{∂}{∂\theta_j} J(\theta_0,\theta_1)$ ，其中 $\frac{∂}{∂\theta_j} J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m(h_{\theta} (x^i)-y^i)x^i_{j}$

}

Feature Scaling：特征缩放

多特征的线性回归中，如果多个输入矩阵的每个维度的取值范围差距太大，譬如 $-1\leq x_{1} \leq 1$ ，而 $-100\leq x_{2} \leq 100$ ，这样会导致梯度下降过程中在 $x_{1}$ 方向上来回摇摆，即梯度下降太慢。可以用 $u_{2} = \frac{x_{2}}{100}$ 代替 $x_{2}$