Overfitting

The Problem of Overfitting

undefitting(high bias): The algroithm doesn't fit the training set.
overfiting(high variance): If we have too many featuers, the learned hypothesis may fit the training set very well, so your cost funciton may very close to zero, and maybe zero exactly, but fail to generalize to new examples.

addressing overfitting

Options:

Reduce number of features.
- Manually select which features to keep.
- Model selection algorithm.
Regularization
- Keep all the features, but reduce magnitude/values of parameters $\theta_j$ .
- Works well when we have a lot of features, each of which contributes a bit to predicting $y$ .

Regularization

Suppose we penalize and make $\theta_3,\theta_4$ (some of the parammerters) really small.

Regularization
Small values for parameters

"Smipler" hypothesis
Less prone to overfitting

$J(\theta)=\frac{1}{2m}[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^n\theta_j^2]$

Regularized Liner Regression

$\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x^{(i)} \newline \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x^{(i)} \newline \theta_j:=\theta_j-\alpha[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j] \newline \theta_j:=\theta_j(1-\alpha\frac{1}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
We don't penalize $\theta_0$ .