Softmax regression

损失函数及梯度

假设总共有m个数据,共分为k类,其中第i个数据为(a^{(i)}, b^{(i)}),则损失函数如下
F(x) = - \frac{1}{m} \sum_{i=1}^{m}\sum_{l=1}^{k}\Bigg(I(b^{(i)}= l) \ln \Bigg(\frac{exp(x_l a^{(i)})}{\sum_{j=1}^{k} exp(x_j a^{(i)})}\Bigg)\Bigg)
其中I(b^{(i)} = l)是示性函数,当b^{(i)} = l时为1,否则为0。那么对x的第l个分量求梯度的公式为:
\frac{\partial{F(x)}}{\partial{x_l}} = - \frac{1}{m} \sum_{i=1}^m \Bigg[\Bigg(I(b^{(i)} = l) - \frac{exp(x_l^T a^{(i)})}{\sum_{j=1}^{k} exp(x_j^T a^{(i)})}\Bigg) (a ^{(i)})^T\Bigg]

梯度公式推导

\begin{align} F(x) &= - \frac{1}{m} \sum_{i=1}^{m}\sum_{l=1}^{k}\Big(I(b^{(i)}= l) \ln \Big(\frac{exp(x_l a^{(i)})}{\sum_{j=1}^{k} exp(x_j a^{(i)})}\Big)\Big) \\ &= - \frac{1}{m} \sum_{i=1}^{m}\sum_{l=1}^{k}\Big(I(b^{(i)}= l) \Big(x_la^{(i)} - \ln \sum_{j=1}^kexp(x_ja^{(i)})\Big)\Big) \\ &= -\frac{1}{m} \sum_{i=1}^{m} \Big(I(b^{(i)}= 1) \Big(x_la^{(i)} - \ln \sum_{j=1}^kexp(x_ja^{(i)})\Big) + \cdots + I(b^{(i)}= k) \Big(x_la^{(i)} - \ln \sum_{j=1}^kexp(x_ja^{(i)})\Big)\Big) \end{align}

  1. 如果此时要求\frac{\partial F(x)}{\partial x_1},那么把与x_1有关的项提出来
    \begin{align} \frac{\partial F(x)}{\partial x_1} &= \frac{\partial}{\partial x_1}\Big(-\frac{1}{m} \sum_{i=1}^{m} \Big(I(b^{(i)}= 1) \Big(x_1a^{(i)} - \ln \sum_{j=1}^kexp(x_ja^{(i)})\Big) + \cdots + I(b^{(i)}= k) \Big( - \ln \sum_{j=1}^kexp(x_ja^{(i)})\Big)\Big)\Big) \\ &= \frac{\partial}{\partial x_1}\Big(-\frac{1}{m}\sum_{i=1}^{m} \Big(I(b^{(i)} = 1) x_1a^{(i)} - (I(b^{(i)} = 1) + \cdots + I(b^{(i)} = k))\ln \sum_{j=1}^kexp(x_ja^{(i)})\Big) \Big) \\ &= \frac{\partial}{\partial x_1}\Big(-\frac{1}{m}\sum_{i=1}^{m}\Big(I(b^{(i)} = 1) x_1a^{(i)} - \ln \sum_{j=1}^kexp(x_ja^{(i)})\Big) \Big) \\ & = - \frac{1}{m} \sum_{i=1}^m \Big[\Big(I(b^{(i)} = 1) - \frac{exp(x_l^T a^{(i)})}{\sum_{j=1}^{k} exp(x_j^T a^{(i)})}\Big) (a ^{(i)})^T\Big] \end{align}
    其中倒数第二个等式是因为I(b^{(i)} = 1) + \cdots + I(b^{(i)} = k) = 1
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。