http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17_2.html
0Introduction of Machine Learing
AI
hand-crafted rules(IFs)——>机器学习(Function)
A set of function(model)
Goodness of functin
Pick the "best" function(好的演算法)
Where does the error come from?
"bias" and "variance"
Estimator
is a estimator of
.
bias and variance of estimator
无偏估计 和 最小方差 trade-off
Model selection
1、简单模型[bias 大,variance 小] underfitting
- 复杂模型
2、复杂模型[bias 小,variance 大] overfitting
- More data
- Regularization
Cross Validation
1training Set
- training set 训练集
- Validation set 验证集[K-fold cross validation]
1Testing Set(public)测试集
0Testing Set(private)
0Gradient Descent
Tip1 Tuning Your Learning Rates
- Adaptive learning rates
开始大,后面小。 - Adagrad
- Vanilla gradient descent
Tip 2 Stochastic Gradient Descent
make the training faster!
Tip 3 Feature Scaling
Make different features have the same scaling!
More limitation
- stuck at local minima
- stuck at saddle point
- very slow at the plateau
1Regression
output: a scalar.
例子:[1.Linear Model
parameter
domain knowledge
feature, weight, bias
2.Loss Function
threshold
3.Gradient Descent
gradient
learning rate]
generalization泛化
overfitting过拟合
regularization正则化:smooth平滑
2Classification
2.1贝叶斯分类
Naive Bayes Classifier:all the dimensions are independent.
属于后验概率Posterior Probability大的一类。
Generative Model
1.先验概率Prior
P(C_{1})、P(C_{2})
2.类条件概率密度Probability from Class
P(x|C_{1})、P(x|C_{2})
Feature(vector)
Assume the N1/N2 points are sampled from a Gaussian distribution:
mean ,
covariance matrix .
Maximum Likelihood
常常假设 ,The boundary is linear。
2.2Logistic Regression
Linear Regression
Logistic Regression
- Step 1: Function Set
- Step 2: Goodness of a Function
1Cross Entropy√
2Square Error× - Step 3: Find the Best Function
Discriminative v.s. Generative
Generative:2.1
Discriminative:2.2
一般 Discriminative 比 Generative (假设多, 脑补) 更好。
Multi-class Classification:softmax
Cross Entropy
Limitation of Logistic Regression
线性不可分
1Feature Transformation
2Cascading Logistic Regression Models