Classification - Logistic Regression

Logistic Regression是一个二元分类监督算法
对于特征向量 x\in\mathbb{R}^n 有预测值\hat{y} 满足0 \leq \hat{y} \leq 1
其中w\in \mathbb{R}^n,b\in \mathbb{R}
\hat y=\sigma(w^Tx+b) 其中 \sigma(z)=\frac{1}{1+e^{-z}}

给定\{(x^{(1)},y^{(1)}),...,(x^{(m)},y^{(m)})\},希望 \hat{y}^{(i)} \approx y^{(i)}

cost function

cost function通过误差来衡量算法的好坏
J(w,b)=\frac{1}{m}\sum_{i=1}^{m}L(\hat y^{(i)},y^{(i)})

其中L(\hat y,y)=-(y \log \hat y+(1-y)\log(1-\hat y))

Gradient Descent

需要找到w和b,使得J(w,b)最小,由于J(w,b)是凸函数,所以梯度下降算法是有效的(\alpha为学习率)
w:=w-\alpha \frac{\partial j(w,b)}{\partial w}
b:=b-\alpha \frac{\partial j(w,b)}{\partial b}

Logistic Regression derivatives

recap
  • z=w^Tx+b
  • \hat y=a=\sigma(z)
  • L(a,y)=-(y \log a+(1-y)\log(1-a))
    i

其中

  • \frac{\partial L(a,y)}{\partial a}=-\frac y a+\frac{1-y}{1-a}

  • \frac {\partial a}{\partial z}=a(1-a)

  • \frac{\partial L(a,y)}{\partial z} =\frac{\partial L(a,y)}{\partial a} .\frac {\partial a}{\partial z}= a-y

  • \frac{\partial z}{\partial w_1}=x_1,\space \frac{\partial z}{\partial w_2}=x_2 ,\space \frac{\partial z}{\partial b}=1

  • \frac{\partial L(a,y)}{\partial w_1}=\frac{\partial L(a,y)}{\partial z} .\frac{\partial z}{\partial w_1}=x_1(a-y)

  • \frac{\partial L(a,y)}{\partial w_2}=\frac{\partial L(a,y)}{\partial z} .\frac{\partial z}{\partial w_2}=x_2(a-y)

  • \frac{\partial L(a,y)}{\partial b}=\frac{\partial L(a,y)}{\partial z} .\frac{\partial z}{\partial b}=a-y

finally
\theta_j:=\theta_j-\alpha \frac{1}{m}\sum_{i=1}^{m}(\hat y^{(x^i)}-y^i)x^{(i)}_j

最终梯度下降的方程和多元线性回归一致,比较神奇

one vs all

训练多个 Logistic Regression

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。