李宏毅机器学习(2017)——笔记1Introduction and Task

http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17_2.html

0Introduction of Machine Learing

AI
hand-crafted rules(IFs)——>机器学习(Function)
A set of function(model) $f_{1},f_{2}\cdots$
Goodness of functin $f$
Pick the "best" function(好的演算法) $f^{*}$

Where does the error come from?

"bias" and "variance"

Estimator
$f^{*}$ is a estimator of $\hat f$ .
bias and variance of estimator

无偏估计和最小方差 trade-off

Model selection
1、简单模型[bias 大，variance 小] underfitting

复杂模型

2、复杂模型[bias 小，variance 大] overfitting

More data
Regularization

Cross Validation
1training Set

training set 训练集
Validation set 验证集[K-fold cross validation]

1Testing Set（public）测试集
0Testing Set（private）

0Gradient Descent

Tip1 Tuning Your Learning Rates

Adaptive learning rates
开始大，后面小。
Adagrad
Vanilla gradient descent

Tip 2 Stochastic Gradient Descent
make the training faster!

Tip 3 Feature Scaling
Make different features have the same scaling!

More limitation

stuck at local minima
stuck at saddle point
very slow at the plateau

1Regression

output: a scalar.

例子：[1.Linear Model
parameter
domain knowledge
feature, weight, bias
2.Loss Function
$L(f)$
$L(w, b)$
threshold
3.Gradient Descent
gradient
learning rate]

generalization泛化
overfitting过拟合
regularization正则化：smooth平滑

2Classification

2.1贝叶斯分类

Naive Bayes Classifier：all the dimensions are independent.
$P(C_{1}|x) = \frac{P(x|C_{1})P(C_{1})}{P(x|C_{1})P(C_{1})+ P(x|C_{2})P(C_{2})}\\ P(C_{2}|x) = \frac{P(x|C_{2})P(C_{2})}{P(x|C_{1})P(C_{1})+ P(x|C_{2})P(C_{2})}.$
$x$ 属于后验概率Posterior Probability大的一类。
Generative Model $P(x) = P(x|C_{1})P(C_{1})+ P(x|C_{2})P(C_{2})$
1.先验概率Prior
P(C_{1})、P(C_{2})
$P(C_{1})=N1/(N_{1}+N_{2})$
2.类条件概率密度Probability from Class
P(x|C_{1})、P(x|C_{2})
Feature（vector）
Assume the N1/N2 points are sampled from a Gaussian distribution：
mean $\mu$ ,
covariance matrix $\Sigma$ .
Maximum Likelihood
$\mu_{1}^{*}、\Sigma_{1}^{*}$