1.搭建双向LSTM模型
在keras中,双向循环神经网络,无论是普通的RNN还是LSTM还是GRU,都没有直接定义的层函数。需要另外先导入一个双向结构Bidirectional,并且不能直接从keras.layers导入,直接导入会报错(说起来都是辛酸泪),正确的导入方法如下:
from keras.layers.wrappers import Bidirectional

其中参数layer就是RNN的类型,可以使简单的SimpleRNN , LSTM , 或者是GRU等等。merge_mode = ‘ concat’表示最后前向的输出和后向输出的连接方式,concat表示直接连接,类似于列表直接加上去的那种。
- 搭建模型
from keras.layers.wrappers import Bidirectional
from keras import Sequential
model = Sequential()
model.add(Bidirectional(LSTM(5,return_sequences=True),input_shape =(None,4),merge_mode='concat'))
model.summary()
Model: "sequential_29"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_26 (Bidirectio (None, None, 10) 400
=================================================================
Total params: 400
Trainable params: 400
Non-trainable params: 0
_________________________________________________________________
和普通LSTM的输入类似,input还是为【batch_size,time_step,num】其中num表示单个时间点上向量的长度,此处搭建的双向LSTM模型的隐藏层的向量维度为5,输入的向量维度为4,每一个时间点都返回一个维度为【batch_size,5】的结果,最后的输出的连接方式为直接像列表那样堆叠。其中400个参数怎么来的:
前向循环过程中,输入层和循环层之间之间的权重其维度为
,因为这里集成了单个LSTM的3个门控结构和一个输入单元状态的权重,在前向计算的时候,每个时间点的权重是复用的,即不同时间点输入和循环层的权重都是一样的(但前向和后向计算时权重不能复用),具体解释详见:keras中LSTM流程的验证,前向计算过程中,循环成之间的权重
,其权重维度为
,因为隐藏层的向量维度为5,所以,权重为5*20,其也是将3个门控结构和单元状态都集成在了一个权重参数中,最后加上偏置
,也是集成了3个门控结构和输入单元状态的参数。
反向循环过程中,输入层和反向循环层之间的权重,其维度和前向循环的维度一样,都为
,反向循环层之间的权重和前向循环层维度一样
【这里说的只是维度,但数据不同,反向循环不能复用前向循环的参数】,偏置长度也为
。所以整个参数的个数为:
。
- 生成测试数据,用网络进行预测
import numpy as np
x=np.random.randint(1,10,(3,2,4))
model_predict = model.predict(x)
model_predict
2.用矩阵乘法自己验证流程
2,1 提取参数
- 提取网络参数(此时集成了3个门控和输入单元状态4个矩阵在一起)
f_x = model.get_weights()[0]
f_h = model.get_weights()[1]
f_bias = model.get_weights()[2]
b_x = model.get_weights()[3]
b_h = model.get_weights()[4]
b_bias = model.get_weights()[5]
- 前向循环的参数
f_x_i = f_x[:,:5]
f_x_f = f_x[:,5:10]
f_x_c_hat = f_x[:,10:15]
f_x_o = f_x[:,15:]
f_h_i = f_h[:,:5]
f_h_f = f_h[:,5:10]
f_h_c_hat = f_h[:,10:15]
f_h_o = f_h[:,15:]
f_b_i =f_bias[:5]
f_b_f =f_bias[5:10]
f_b_c_hat = f_bias[10:15]
f_b_o = f_bias[15:]
- 反向循环层的参数
b_x_i = b_x[:,:5]
b_x_f = b_x[:,5:10]
b_x_c_hat = b_x[:,10:15]
b_x_o = b_x[:,15:]
b_h_i = b_h[:,:5]
b_h_f = b_h[:,5:10]
b_h_c_hat = b_h[:,10:15]
b_h_o = b_h[:,15:]
b_b_i =b_bias[:5]
b_b_f =b_bias[5:10]
b_b_c_hat = b_bias[10:15]
b_b_o = b_bias[15:]
2.2 前向循环
t1 = x[:,0,:]
t2 = x[:,1,:]
def activation_sigmoid(x):
return 1. / (1 + np.exp(-x))
def tanh(x):
return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
- 第一步
f_t1=activation_sigmoid(np.dot(t1,f_x_f)+f_b_f)
i_t1=activation_sigmoid(np.dot(t1,f_x_i)+f_b_i)
c_hat_t1=tanh(np.dot(t1,f_x_c_hat)+f_b_c_hat)
o_t1=activation_sigmoid(np.dot(t1,f_x_o)+f_b_o)
c_t1=np.multiply(i_t1,c_hat_t1)
h_t1 = np.multiply(o_t1,tanh(c_t1))
- 第二步
f_t2=activation_sigmoid(np.dot(t2,f_x_f)+np.dot(h_t1,f_h_f)+f_b_f)
i_t2=activation_sigmoid(np.dot(t2,f_x_i)+np.dot(h_t1,f_h_i)+f_b_i)
c_hat_t2=tanh(np.dot(t2,f_x_c_hat)+np.dot(h_t1,f_h_c_hat)+f_b_c_hat)
o_t2=activation_sigmoid(np.dot(t2,f_x_o)+np.dot(h_t1,f_h_o)+f_b_o)
c_t2=np.multiply(i_t2,c_hat_t2)+np.multiply(f_t2,c_t1)
h_t2=np.multiply(o_t2,tanh(c_t2))
- 前向循环结果的比较
r1=model_predict[:,:,:5]
r1_f = r1[:,0,:]
r2_f=r1[:,1,:]
r1_f
array([[-3.6626393e-03, 6.2349676e-03, -1.0241446e-02, 1.1260298e-02,
2.1286098e-04],
[ 1.6157882e-03, 7.0518767e-03, -6.6360063e-04, 5.8959927e-03,
1.0307142e-05],
[-5.0121326e-02, -3.4172744e-03, -9.2119582e-02, 8.0897294e-02,
1.9901720e-04]], dtype=float32)
h_t1
array([[-3.66263849e-03, 6.23496841e-03, -1.02414461e-02,
1.12602997e-02, 2.12860939e-04],
[ 1.61578794e-03, 7.05187578e-03, -6.63600872e-04,
5.89599269e-03, 1.03071421e-05],
[-5.01213321e-02, -3.41727473e-03, -9.21195959e-02,
8.08973021e-02, 1.99017258e-04]])
r2_f
array([[ 2.6547471e-02, 1.6672976e-01, -7.6260559e-02, 3.5527071e-01,
2.5992340e-05],
[ 9.6616702e-04, 2.1982826e-02, -6.4846366e-03, 3.4722306e-02,
2.8259938e-06],
[-5.1310571e-04, 1.5508698e-02, -8.0165938e-03, 3.8796179e-02,
-5.4351776e-08]], dtype=float32)
h_t2
array([[ 2.65474679e-02, 1.66729763e-01, -7.62605588e-02,
3.55270677e-01, 2.59923377e-05],
[ 9.66166752e-04, 2.19828293e-02, -6.48463880e-03,
3.47222996e-02, 2.82599205e-06],
[-5.13105677e-04, 1.55086960e-02, -8.01659429e-03,
3.87961811e-02, -5.43516034e-08]]
前向计算结果一致
2.3 反向循环
f_t2=activation_sigmoid(np.dot(t2,b_x_f)+b_b_f)
i_t2=activation_sigmoid(np.dot(t2,b_x_i)+b_b_i)
c_hat_t2=tanh(np.dot(t2,b_x_c_hat)+b_b_c_hat)
o_t2=activation_sigmoid(np.dot(t2,b_x_o)+b_b_o)
c_t2=np.multiply(i_t2,c_hat_t2)
h_t2 = np.multiply(o_t2,tanh(c_t2))
f_t1=activation_sigmoid(np.dot(t1,b_x_f)+np.dot(h_t2,b_h_f)+b_b_f)
i_t1=activation_sigmoid(np.dot(t1,b_x_i)+np.dot(h_t2,b_h_i)+b_b_i)
c_hat_t1=tanh(np.dot(t1,b_x_c_hat)+np.dot(h_t2,b_h_c_hat)+b_b_c_hat)
o_t1=activation_sigmoid(np.dot(t1,b_x_o)+np.dot(h_t2,b_h_o)+b_b_o)
c_t1=np.multiply(i_t1,c_hat_t1)+np.multiply(f_t1,c_t2)
h_t1=np.multiply(o_t1,tanh(c_t1))
- 比较最后的结果
rb=model_predict[:,:,5:]
rb_1 = rb[:,0,:]
rb_2=rb[:,1,:]
rb_1
h_t1
array([[ 1.0876619e-02, 3.3821985e-02, -4.0591559e-01, 3.9178026e-01,
3.1826463e-01],
[-2.8693167e-02, 1.9592401e-03, -7.0860958e-01, 6.0334247e-02,
8.9470990e-02],
[ 5.8203835e-02, -4.3265164e-04, -3.5573176e-01, 1.1807156e-01,
4.1030297e-01]], dtype=float32)
array([[ 1.08766235e-02, 3.38219726e-02, -4.05915545e-01,
3.91780254e-01, 3.18264650e-01],
[-2.86931695e-02, 1.95924093e-03, -7.08609592e-01,
6.03342282e-02, 8.94709004e-02],
[ 5.82038303e-02, -4.32651737e-04, -3.55731734e-01,
1.18071566e-01, 4.10302977e-01]])
rb_2
h_t2
array([[-5.05645275e-02, 3.40539482e-05, -3.85923684e-01,
-1.20956276e-04, 5.13832629e-01],
[-2.52256282e-02, -3.12560966e-04, -1.79331765e-01,
4.03707288e-02, -4.84595716e-01],
[-3.37178409e-02, 7.72565545e-06, -3.49384636e-01,
2.71743489e-03, 6.98080599e-01]], dtype=float32)
array([[-5.05645233e-02, 3.40539369e-05, -3.85923610e-01,
-1.20956259e-04, 5.13832636e-01],
[-2.52256297e-02, -3.12560938e-04, -1.79331742e-01,
4.03707398e-02, -4.84595768e-01],
[-3.37178429e-02, 7.72567430e-06, -3.49384624e-01,
2.71743472e-03, 6.98080669e-01]])
结果一致,整个双向LSTM验证通过,存储和计算的流程验证和我想象中的一样。