参考https://blog.csdn.net/g11d111/article/details/70943463
反向传播推导
BP 网络的输入层有n个节点,隐层有q个节点,输出层有m个节点
输入层与隐层之间的权值为v_{ki},隐层与输出层之间的权值为w_{jk}
隐层的激活函数为f_1(),输出层的激活函数为f_2()
其中i = 1,2,3,…,n;k = 1,2,3,…,q;j = 1,2,3,…,m
定义损失函数
输入P个学习样本,用x^{1},x^{2},...,x^{p}来表示。
第p个样本输入到网络后得到输出 y^{p}_j(j=1,2,…m)。采用平方型误差函数,于是得到第p个样本的误差E_p:
E_p = \frac{1}{2}\sum_{j=1}^m (t^{p}_j - y^{p}_j)^2
对于全部的P个样本,全局误差为:
E = \frac{1}{2}\sum_{p=1}^P\sum_{j=1}^m (t^{p}_j - y^{p}_j)^2
输出层权值的变化
\Delta w_{jk} = -\alpha \frac{\partial E}{\partial w_{jk}}=-\alpha \frac{\partial \sum_{p=1}^P E_p}{\partial w_{jk}}
定义误差信号为:
\delta_{y_{j}} = - \frac{\partial E_p}{\partial S_{j}}=-\frac{\partial E_p}{\partial y_{j}} \frac{\partial y_j}{\partial S_{j}}
其中y_j = f_2(S_j)
其中第一项:
\frac{\partial E_p}{\partial y_{j}} = \frac{\partial}{\partial y_{j}}[\frac{1}{2}\sum_{j=1}^m (t^{p}_j - y^{p}_j)^2]=- (t^{p}_j - y^{p}_j)
第二项:
\frac{\partial y_j}{\partial S_{j}} = f^{'}_2(S_j)
于是:
\delta_{y_{j}} = (t^{p}_j - y^{p}_j)f^{'}_2(S_j)
由链式法则得:
-\frac{\partial E_p}{\partial w_{jk}} = -\frac{\partial E_p}{\partial S_{j}} \frac{\partial S_j}{\partial w_{jk}} = \delta_{yj}z_k
于是输出层各神经元的权值调整公式为:
\Delta w_{jk} = \sum_{p=1}^P \alpha(t^{p}_j - y^{p}_j)f^{'}_2(S_j)z_k
隐藏层权值的变化
\Delta v_{ki} = -\alpha \frac{\partial E}{\partial v_{ki}}=-\alpha \frac{\partial \sum_{p=1}^P E_p}{\partial v_{ki}} =\sum_{p=1}^P (-\alpha\frac{\partial E_p}{\partial v_{ki}})
定义误差信号为:
\delta_{k_i} = -\frac{\partial E_p}{\partial S_k} = -\frac{\partial E_p}{\partial z_k}\frac{\partial z_k}{\partial S_k}
其中第一项:
\frac{\partial E_p}{\partial z_k} = \sum_{j=1}^m\frac{\partial E_p}{\partial y_j}\frac{\partial y_j}{\partial S_j}\frac{\partial S_j}{\partial z_k} = -\sum_{j=1}^m(t_j^p-y_j^p)f_2^{'}(S_j)w_{kj}
第二项:
\frac{\partial z_k}{\partial S_k} = f_1^{'}(S_k)
于是:
\delta_{k_i} = \sum_{j=1}^m(t_j^p-y_j^p)f^{'}_2(S_j)w_{kj}f^{'}_1(S_k)
由链式法则得:
-\frac{\partial E_p}{\partial v_{ki}} = - \frac{\partial E_p}{\partial S_k}\frac{\partial S_k}{\partial v_{ki}} = \delta_{k_i}x_i
从而得到隐层各神经元的权值调整公式为:
\Delta v_{ki} = \sum_{p=1}^P\alpha\delta_{k_i}x_i = \alpha\sum_{p=1}^P\sum_{j=1}^m(t_j^p-y_j^p)f^{'}_2(S_j)w_{kj}f^{'}_1(S_k)x_i