李宏毅机器学习(2017)——笔记1Introduction and Task

http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17_2.html

0Introduction of Machine Learing

AI
hand-crafted rules(IFs)——>机器学习(Function)
A set of function(model) f_{1},f_{2}\cdots
Goodness of functin f
Pick the "best" function(好的演算法) f^{*}

Where does the error come from?

"bias" and "variance"

Estimator
f^{*} is a estimator of \hat f.
bias and variance of estimator

无偏估计 和 最小方差 trade-off

Model selection
1、简单模型[bias 大,variance 小] underfitting

  • 复杂模型

2、复杂模型[bias 小,variance 大] overfitting

  • More data
  • Regularization

Cross Validation
1training Set

  • training set 训练集
  • Validation set 验证集[K-fold cross validation]

1Testing Set(public)测试集
0Testing Set(private)


0Gradient Descent

Tip1 Tuning Your Learning Rates

  1. Adaptive learning rates
    开始大,后面小。
  2. Adagrad
  3. Vanilla gradient descent

Tip 2 Stochastic Gradient Descent
make the training faster!

Tip 3 Feature Scaling
Make different features have the same scaling!

More limitation

  1. stuck at local minima
  2. stuck at saddle point
  3. very slow at the plateau

1Regression

output: a scalar.

例子:[1.Linear Model
parameter
domain knowledge
feature, weight, bias
2.Loss Function
L(f)
L(w, b)
threshold
3.Gradient Descent
gradient
learning rate]

generalization泛化
overfitting过拟合
regularization正则化:smooth平滑


2Classification

2.1贝叶斯分类

Naive Bayes Classifier:all the dimensions are independent.
P(C_{1}|x) = \frac{P(x|C_{1})P(C_{1})}{P(x|C_{1})P(C_{1})+ P(x|C_{2})P(C_{2})}\\ P(C_{2}|x) = \frac{P(x|C_{2})P(C_{2})}{P(x|C_{1})P(C_{1})+ P(x|C_{2})P(C_{2})}.
x 属于后验概率Posterior Probability大的一类。
Generative Model P(x) = P(x|C_{1})P(C_{1})+ P(x|C_{2})P(C_{2})
1.先验概率Prior
P(C_{1})、P(C_{2})
P(C_{1})=N1/(N_{1}+N_{2})
2.类条件概率密度Probability from Class
P(x|C_{1})、P(x|C_{2})
Feature(vector)
Assume the N1/N2 points are sampled from a Gaussian distribution:
mean \mu,
covariance matrix \Sigma.
Maximum Likelihood
\mu_{1}^{*}、\Sigma_{1}^{*}


常常假设 ,The boundary is linear。

2.2Logistic Regression

Linear Regression


Logistic Regression


  • Step 1: Function Set
  • Step 2: Goodness of a Function
    1Cross Entropy√
    2Square Error×
  • Step 3: Find the Best Function

Discriminative v.s. Generative
Generative:2.1
Discriminative:2.2
一般 Discriminative 比 Generative (假设多, 脑补) 更好。

Multi-class Classification:softmax
Cross Entropy

Limitation of Logistic Regression
线性不可分



1Feature Transformation
2Cascading Logistic Regression Models


3Structured Learning

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容