7. Overfitting

Overfitting

The Problem of Overfitting

  • undefitting(high bias): The algroithm doesn't fit the training set.
  • overfiting(high variance): If we have too many featuers, the learned hypothesis may fit the training set very well, so your cost funciton may very close to zero, and maybe zero exactly, but fail to generalize to new examples.

addressing overfitting

Options:

  1. Reduce number of features.
    • Manually select which features to keep.
    • Model selection algorithm.
  2. Regularization
    • Keep all the features, but reduce magnitude/values of parameters \theta_j.
    • Works well when we have a lot of features, each of which contributes a bit to predicting y.

Regularization

Suppose we penalize and make \theta_3,\theta_4(some of the parammerters) really small.

Regularization
Small values for parameters

  • "Smipler" hypothesis
  • Less prone to overfitting

J(\theta)=\frac{1}{2m}[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^n\theta_j^2]

Regularized Liner Regression

\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x^{(i)} \newline \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x^{(i)} \newline \theta_j:=\theta_j-\alpha[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j] \newline \theta_j:=\theta_j(1-\alpha\frac{1}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}
We don't penalize \theta_0.

Normal equation

\theta=(X^TX+\lambda\begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix})^{-1}X^Ty
The matrix should be a (n+1)x(n+1) matrix.

non-invertability issue

If m\le n,then X^TX would be non-invertable or singler.
if \lambda \ne0 ,the matrix there would be invertable.

Regularized logistic retression

Cost function:
J(\theta)=-[\frac{1}{m}\sum_{i=1}^my^{(i)}log \;h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))]

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。