机器学习06-继续线性分类(软分类)---高斯判别分析

这一篇我们介绍另一种生成式模型, 高斯判别分析。

给定数据集D = \{{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)}\}, x_i\in R^py = \{0, 1\}

我们对数据模型有一些前提假设:

y \sim Bernoulli(\phi ), 即p(y) = \phi ^y*(1-\phi)^{1-y}

x|y=1 \sim N(\mu,\Sigma)

x|y=0 \sim N(\mu_0,\Sigma)

所以p(x|y) \sim N(\mu_1,\Sigma )^{y_i}*N(\mu_2, \Sigma)^{1-y_i}

从上一节,我们已经知道了现在我们的目标是最大化联合概率,

即 最大化p(x,y) = \prod_{i=1}^n p(x_i|y_i)p(y_i)

对数似然估计 L =  log\prod_{i=1}^n p(x_i|y_i)p(y_i) = \sum_{i=1}^n log( p(x_i|y_i)p(y_i)) = \sum_{i=1}^n [log\ p(x_i|y_i) +log\ p(y_i)]

 =\sum_{i=1}^n [log\ p(x_i|y_i) +log\ p(y_i)] = \sum_{i=1}^n [log\ N(\mu_1,\Sigma)^{y_i}N(\mu_2,\Sigma)^{1-y_i} +log\ \phi^{y_i}(1-\phi)^{1-y_i}]

 = \sum_{i=1}^n [log\ N(\mu_1,\Sigma)^{y_i} + log\ N(\mu_2,\Sigma)^{1-y_i} +log\ \phi^{y_i}(1-\phi)^{1-y_i}]  (*)

目标就是求 \mu_1, \mu_2, \Sigma, \phi, 使得联合概率的极大似然估计最大。\hat \mu_1, \hat \mu_2, \hat \Sigma, \hat \phi = argmax \ L

(a)先求\phi, 看(*)中的最后一项,\sum_{i=1}^n [log\ \phi^{y_i}(1-\phi)^{1-y_i}] = \sum_{i=1}^n y_ilog\ \phi +(1-y_i)log\ (1-\phi), 设为A

\frac{\partial A}{\partial \phi} = \sum_{i=-1}^ny_i\frac{1}{\phi} +(1-y_i)\frac{1}{1-\phi}*(-1) = 0

\sum_{i=-1}^ny_i(1-\phi) -(1-y_i)\phi = \sum_{i=-1}^ny_i - y_i\phi - \phi + y_i\phi = 0

所以\sum_{i=1}^n (y_i - \phi) = 0, 

推出\hat \phi = \frac{1}{n}\sum_{1}^n y_i = \frac{n_1}{n}  , 其中n_1是指属于类别1的样本个数和

(b)接下来求\mu_1(*)中的第一项,\sum_{i=1}^nlog\ N(\mu_1,\Sigma)^{y_i} = \sum_{i=1}^ny_ilog\ \frac{1}{(2\pi)^{p/2}|\Sigma|^{\frac{1}{2} }}exp(-\frac{1}{2} (x_i - \mu_1)^T\Sigma^{-1}(x_i-\mu_1))

\hat \mu_1 = argmax(\sum_{i=1}^n -\frac{1}{2} y_i(x_i - \mu_1)^T\Sigma^{-1}(x_i-\mu_1))  = argmin(\sum_{i=1}^n y_i(x_i - \mu_1)^T\Sigma^{-1}(x_i-\mu_1))

\sum_{i=1}^n y_i(x_i - \mu_1)^T\Sigma^{-1}(x_i-\mu_1) = \sum_{i=1}^ny_i(x_i^T\Sigma ^{-1} - \mu_1^T\Sigma ^{-1})(x_i-\mu_1)

 = \sum_{i=1}^ny_i(x_i^T\Sigma ^{-1}x_i - x_i^T\Sigma ^{-1}\mu_1 - \mu_1^T\Sigma ^{-1}x_i + \mu_1^T\Sigma ^{-1}\mu_1)

把上式记为\Delta

\frac{\partial \Delta}{\partial \mu_1}  = \sum_{i=1}^n y_i(-2\Sigma ^{-1}x_i +2 \Sigma ^{-1}\mu_1) = 0

解得\hat \mu_1  = \frac{\sum_{i=1}^n y_ix_i}{\sum_{i=1}^n y_i} ,同理可求得\hat \mu_2  = \frac{\sum_{i=1}^n (1-y_i)x_i}{\sum_{i=1}^n (1-y_i)}

(c)接下来是最难求的\Sigma\hat \Sigma  = argmax\sum_{i=1}^n [log\ N(\mu_1,\Sigma)^{y_i} + log\ N(\mu_2,\Sigma)^{1-y_i}]

我们首先来看 log\ N(\mu,\Sigma) =log\ \frac{1}{(2\pi)^{p/2}|\Sigma|^{\frac{1}{2} }}exp(-\frac{1}{2} (x - \mu)^T\Sigma^{-1}(x-\mu))   = log\ \frac{1}{(2\pi)^{p/2}} + log\ |\Sigma|^{-\frac{1}{2}} + (-\frac{1}{2} (x - \mu)^T\Sigma^{-1}(x-\mu))

= C - \frac{1}{2} log\ |\Sigma| -\frac{1}{2} (x - \mu)^T\Sigma^{-1}(x-\mu)

最后一项其实是 1*p 的向量, p*p的矩阵, p*1的向量, 所有结果是1*1, 我们加上迹操作,是不影响结果的。

\sum_{i=1}^n  tr((x - \mu)^T\Sigma^{-1}(x-\mu)) = \sum_{i=1}^ntr((x-\mu)(x - \mu)^T\Sigma^{-1}) = tr(ns\Sigma^{-1}) = ntr(s\Sigma^{-1})

其中s指的是类内部的协方差矩阵

现在我们回到\hat \Sigma = argmax\sum_{i=1}^n [log\ N(\mu_1,\Sigma)^{y_i} + log\ N(\mu_2,\Sigma)^{1-y_i}]

= C - \frac{1}{2}n_1 log\ |\Sigma| -\frac{1}{2}n_1tr(s_1\Sigma ^{-1})- \frac{1}{2}n_2 log\ |\Sigma| -\frac{1}{2}n_2tr(s_2\Sigma ^{-1})

= C - \frac{1}{2}n log\ |\Sigma| -\frac{1}{2}n_1tr(s_1\Sigma ^{-1}) -\frac{1}{2}n_2tr(s_2\Sigma ^{-1})  记为\Delta

\frac{\partial \Delta}{\partial \Sigma}  = -\frac{1}{2} (n\Sigma ^{-1} - n_1s_1\Sigma ^{-2} - n_2s_2\Sigma ^{-2}) = 0,

其中\frac{\partial tr(s_1\Sigma ^{-1})}{\partial \Sigma}  = \frac{\partial tr(\Sigma ^{-1}s_1)}{\partial \Sigma} = s_1^T\frac{\partial \Sigma ^{-1}}{\partial \Sigma} = s_1^T\Sigma ^{-2}

所以n - n_1s_1\Sigma ^{-1} - n_2s_2\Sigma ^{-1}  = 0

\Sigma = \frac{1}{n} (n_1s_1+ n_2s_2)

综上,我们从现有样本,通过最大化联合概率,可以求出\mu_1, \mu_2, \Sigma, \phi, 当有新的样本时,比较 p(y=1|x) \  \ p(y=0|x)的大小关系即可判断具体类别。

生成式模型会暂时告一段路,核心就是对联合概率建模

转载请注明,谢谢

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。