Advice for applying machine learning
Deciding what to try next
Debugging a learining algorithm:
- Get more training set. (Sometimes doesn't actrully help)
- Try smaller sets of features. ()
- Try getting additional features
- Try adding polynomial features
- Try decreasing or increasing
Machine learning diagnostic: save your time.
Evaluating a hypothesis
Test your hypothesis whether overfitting or not:
- Split your training into 2 part.one is Training set (70%),another is Test set(30%).
Training/testing procedure for logistic regression
- Learn parameter from training data
- Compute test set error (liner/logistic regression)
- Misclassification error (logistic regression)
Model selection and training/validation/test sets
when using to select which model to choose, overfitting may be happen and the result the model perform best isn't generlized.
is likely to be an optimistic estimate of generalization error.
Split the dataset into 3 pieces.
- training set (60%)
- cross validation set (20%)
- testing set (20%)
Use the cross validation to select model.
Use the testing set to test the generlazation error.
Diagnosing bias vs. variance
underfitting overfitting
Bias (underfit): will be high;
Variance(overfit): will be low;
Regularization and bias/variance
To find a good
Try 0 0.01 0.02 0.04 0.08 ...10.24
Get many .
Use cross validation set to compute , pick the minium of these
There are two figure in the two videos before, it's very useful to help to understand how the cross validation set helps to get best model and best
Learing curves
error to m (training set size)
If a learing algorithe is suffering from high bies, getting more training data will not help much.
If a learning algorithm is suffering from high variance, getting more training data is likely to help .
Deciding what to try next (revisited)
bias: underfit
varaance: overfit
- Get more training examples: fix high variance
- Try smaller sets of features: fix high variance
- Try getting additional features: fix high bias
- Try adding polynomial features: fix high bias
- Try decreasiong : fix high bias
- Try increasing : fix high variance