- 首先来解析一下什么是Batch Normalization
- batch normalizations(批量归一化):
- forward详解:
- `input = {X,gamma,beta,eps}`
- 取X的均值和方差`sample_mean`和`sample_var`
- 对X进行正则化`(normalize)`
- 通过`sample_mean`和`sample_var`更新在test中运行的值
`running_mean = momentum * running_mean + (1-momentum)*sample_mean`
`running_var = momentum * running_var + (1-momentum)*sample_var`
- backward详解:
- `input = {dout cache}`
- 按照公式反推(表达式很简单)
x_normalized,gamma,beta,sample_mean,sample_var,x,eps = cache
N,D = x.shape
dx_normalized = dout * gamma
x_mu = x-sample_mean
sample_std_inv = 1.0/np.sqrt(sample_var+eps)
dsample_var = -0.5*np.sum(dx_normalized*x_mu,axis=0,keepdims=True)*sample_std_inv**3
dsample_mean = -1.0*np.sum(dx_normalized*sample_std_inv,axis=0,keepdims=True) - \
2.0*dsample_var*np.mean(x_mu,axis=0,keepdims=True)
dx1 = dx_normalized * sample_std_inv
dx2 = 2.0/N*dsample_var*x_mu
dx = dx1+dx2+1.0/N*dsample_mean
dgamma = np.sum(dout*x_normalized,axis=0,keepdims=True)
dbeta = np.sum(dout,axis=0,keepdims=True)
-
基于affine-BN-ReLu
的简易神经网络详解
MultilayerNeuralNetwork.png
- 使用SGD,每次的输入是一个(N,D)的矩阵X
- 经历两个隐藏层,每个隐藏层相当于
全连接层-Batch Normalization层-ReLu层
- 输出损失函数可以采用softmax或者svm hinge函数
- 反向传播按照之前解释过的affine,BN,ReLu如何反向求导即可