Overfitting
The Problem of Overfitting
- undefitting(high bias): The algroithm doesn't fit the training set.
- overfiting(high variance): If we have too many featuers, the learned hypothesis may fit the training set very well, so your cost funciton may very close to zero, and maybe zero exactly, but fail to generalize to new examples.
addressing overfitting
Options:
- Reduce number of features.
- Manually select which features to keep.
- Model selection algorithm.
- Regularization
- Keep all the features, but reduce magnitude/values of parameters .
- Works well when we have a lot of features, each of which contributes a bit to predicting .
Regularization
Suppose we penalize and make (some of the parammerters) really small.
Regularization
Small values for parameters
- "Smipler" hypothesis
- Less prone to overfitting
Regularized Liner Regression
We don't penalize .
Normal equation
The matrix should be a (n+1)x(n+1) matrix.
non-invertability issue
If ,then would be non-invertable or singler.
if ,the matrix there would be invertable.
Regularized logistic retression
Cost function: