1.搭建双向LSTM模型

在keras中，双向循环神经网络，无论是普通的RNN还是LSTM还是GRU，都没有直接定义的层函数。需要另外先导入一个双向结构Bidirectional，并且不能直接从keras.layers导入，直接导入会报错（说起来都是辛酸泪），正确的导入方法如下：

from keras.layers.wrappers import Bidirectional

Bidirectional的参数

其中参数layer就是RNN的类型，可以使简单的SimpleRNN , LSTM , 或者是GRU等等。merge_mode = ‘ concat’表示最后前向的输出和后向输出的连接方式，concat表示直接连接，类似于列表直接加上去的那种。

搭建模型

from keras.layers.wrappers import Bidirectional
from keras import Sequential
model = Sequential()
model.add(Bidirectional(LSTM(5,return_sequences=True),input_shape =(None,4),merge_mode='concat'))
model.summary()

Model: "sequential_29"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bidirectional_26 (Bidirectio (None, None, 10)          400       
=================================================================
Total params: 400
Trainable params: 400
Non-trainable params: 0
_________________________________________________________________

和普通LSTM的输入类似，input还是为【batch_size,time_step,num】其中num表示单个时间点上向量的长度，此处搭建的双向LSTM模型的隐藏层的向量维度为5，输入的向量维度为4，每一个时间点都返回一个维度为【batch_size,5】的结果，最后的输出的连接方式为直接像列表那样堆叠。其中400个参数怎么来的：
前向循环过程中，输入层和循环层之间之间的权重 $f_x$ 其维度为 $()_{4*20}$ ,因为这里集成了单个LSTM的3个门控结构和一个输入单元状态的权重，在前向计算的时候，每个时间点的权重是复用的，即不同时间点输入和循环层的权重都是一样的（但前向和后向计算时权重不能复用），具体解释详见：keras中LSTM流程的验证，前向计算过程中，循环成之间的权重 $f_h$ ,其权重维度为 $()_{5*20}$ ，因为隐藏层的向量维度为5，所以，权重为5*20，其也是将3个门控结构和单元状态都集成在了一个权重参数中，最后加上偏置 $()_{20}$ ，也是集成了3个门控结构和输入单元状态的参数。
反向循环过程中，输入层和反向循环层之间的权重 $b_x$ ，其维度和前向循环的维度一样，都为 $()_{4*20}$ ，反向循环层之间的权重和前向循环层维度一样 $()_{5*20}$ 【这里说的只是维度，但数据不同，反向循环不能复用前向循环的参数】，偏置长度也为 $()_{20}$ 。所以整个参数的个数为： $80+100+20+80+100+20=400$ 。

生成测试数据，用网络进行预测

import numpy as np
x=np.random.randint(1,10,(3,2,4))
model_predict = model.predict(x)
model_predict

2.用矩阵乘法自己验证流程

2,1 提取参数

提取网络参数（此时集成了3个门控和输入单元状态4个矩阵在一起）

f_x = model.get_weights()[0]
f_h = model.get_weights()[1]
f_bias = model.get_weights()[2]
b_x = model.get_weights()[3]
b_h = model.get_weights()[4]
b_bias = model.get_weights()[5]

前向循环的参数

f_x_i = f_x[:,:5]
f_x_f = f_x[:,5:10]
f_x_c_hat = f_x[:,10:15]
f_x_o = f_x[:,15:]

f_h_i = f_h[:,:5]
f_h_f = f_h[:,5:10]
f_h_c_hat = f_h[:,10:15]
f_h_o = f_h[:,15:]

f_b_i =f_bias[:5]
f_b_f =f_bias[5:10]
f_b_c_hat = f_bias[10:15]
f_b_o = f_bias[15:]

反向循环层的参数

b_x_i = b_x[:,:5]
b_x_f = b_x[:,5:10]
b_x_c_hat = b_x[:,10:15]
b_x_o = b_x[:,15:]

b_h_i = b_h[:,:5]
b_h_f = b_h[:,5:10]
b_h_c_hat = b_h[:,10:15]
b_h_o = b_h[:,15:]

b_b_i =b_bias[:5]
b_b_f =b_bias[5:10]
b_b_c_hat = b_bias[10:15]
b_b_o = b_bias[15:]

2.2 前向循环

t1 = x[:,0,:]
t2 = x[:,1,:]
def activation_sigmoid(x):
    return 1. / (1 + np.exp(-x))

def tanh(x):
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))

第一步

f_t1=activation_sigmoid(np.dot(t1,f_x_f)+f_b_f)
i_t1=activation_sigmoid(np.dot(t1,f_x_i)+f_b_i)
c_hat_t1=tanh(np.dot(t1,f_x_c_hat)+f_b_c_hat)
o_t1=activation_sigmoid(np.dot(t1,f_x_o)+f_b_o)
c_t1=np.multiply(i_t1,c_hat_t1) 
h_t1 = np.multiply(o_t1,tanh(c_t1))

第二步

f_t2=activation_sigmoid(np.dot(t2,f_x_f)+np.dot(h_t1,f_h_f)+f_b_f)
i_t2=activation_sigmoid(np.dot(t2,f_x_i)+np.dot(h_t1,f_h_i)+f_b_i)
c_hat_t2=tanh(np.dot(t2,f_x_c_hat)+np.dot(h_t1,f_h_c_hat)+f_b_c_hat)
o_t2=activation_sigmoid(np.dot(t2,f_x_o)+np.dot(h_t1,f_h_o)+f_b_o)
c_t2=np.multiply(i_t2,c_hat_t2)+np.multiply(f_t2,c_t1) 
h_t2=np.multiply(o_t2,tanh(c_t2))

前向循环结果的比较

r1=model_predict[:,:,:5]

r1_f = r1[:,0,:]
r2_f=r1[:,1,:]

r1_f

array([[-3.6626393e-03,  6.2349676e-03, -1.0241446e-02,  1.1260298e-02,
         2.1286098e-04],
       [ 1.6157882e-03,  7.0518767e-03, -6.6360063e-04,  5.8959927e-03,
         1.0307142e-05],
       [-5.0121326e-02, -3.4172744e-03, -9.2119582e-02,  8.0897294e-02,
         1.9901720e-04]], dtype=float32)

h_t1

array([[-3.66263849e-03,  6.23496841e-03, -1.02414461e-02,
         1.12602997e-02,  2.12860939e-04],
       [ 1.61578794e-03,  7.05187578e-03, -6.63600872e-04,
         5.89599269e-03,  1.03071421e-05],
       [-5.01213321e-02, -3.41727473e-03, -9.21195959e-02,
         8.08973021e-02,  1.99017258e-04]])

r2_f

array([[ 2.6547471e-02,  1.6672976e-01, -7.6260559e-02,  3.5527071e-01,
         2.5992340e-05],
       [ 9.6616702e-04,  2.1982826e-02, -6.4846366e-03,  3.4722306e-02,
         2.8259938e-06],
       [-5.1310571e-04,  1.5508698e-02, -8.0165938e-03,  3.8796179e-02,
        -5.4351776e-08]], dtype=float32)

h_t2

array([[ 2.65474679e-02,  1.66729763e-01, -7.62605588e-02,
         3.55270677e-01,  2.59923377e-05],
       [ 9.66166752e-04,  2.19828293e-02, -6.48463880e-03,
         3.47222996e-02,  2.82599205e-06],
       [-5.13105677e-04,  1.55086960e-02, -8.01659429e-03,
         3.87961811e-02, -5.43516034e-08]]

前向计算结果一致

2.3 反向循环

f_t2=activation_sigmoid(np.dot(t2,b_x_f)+b_b_f)
i_t2=activation_sigmoid(np.dot(t2,b_x_i)+b_b_i)
c_hat_t2=tanh(np.dot(t2,b_x_c_hat)+b_b_c_hat)
o_t2=activation_sigmoid(np.dot(t2,b_x_o)+b_b_o)
c_t2=np.multiply(i_t2,c_hat_t2) 
h_t2 = np.multiply(o_t2,tanh(c_t2))  

f_t1=activation_sigmoid(np.dot(t1,b_x_f)+np.dot(h_t2,b_h_f)+b_b_f)
i_t1=activation_sigmoid(np.dot(t1,b_x_i)+np.dot(h_t2,b_h_i)+b_b_i)
c_hat_t1=tanh(np.dot(t1,b_x_c_hat)+np.dot(h_t2,b_h_c_hat)+b_b_c_hat)
o_t1=activation_sigmoid(np.dot(t1,b_x_o)+np.dot(h_t2,b_h_o)+b_b_o)
c_t1=np.multiply(i_t1,c_hat_t1)+np.multiply(f_t1,c_t2) 
h_t1=np.multiply(o_t1,tanh(c_t1))

比较最后的结果

rb=model_predict[:,:,5:]

rb_1 = rb[:,0,:]
rb_2=rb[:,1,:]

rb_1
h_t1

array([[ 1.0876619e-02,  3.3821985e-02, -4.0591559e-01,  3.9178026e-01,
         3.1826463e-01],
       [-2.8693167e-02,  1.9592401e-03, -7.0860958e-01,  6.0334247e-02,
         8.9470990e-02],
       [ 5.8203835e-02, -4.3265164e-04, -3.5573176e-01,  1.1807156e-01,
         4.1030297e-01]], dtype=float32)
array([[ 1.08766235e-02,  3.38219726e-02, -4.05915545e-01,
         3.91780254e-01,  3.18264650e-01],
       [-2.86931695e-02,  1.95924093e-03, -7.08609592e-01,
         6.03342282e-02,  8.94709004e-02],
       [ 5.82038303e-02, -4.32651737e-04, -3.55731734e-01,
         1.18071566e-01,  4.10302977e-01]])

rb_2
h_t2

array([[-5.05645275e-02,  3.40539482e-05, -3.85923684e-01,
        -1.20956276e-04,  5.13832629e-01],
       [-2.52256282e-02, -3.12560966e-04, -1.79331765e-01,
         4.03707288e-02, -4.84595716e-01],
       [-3.37178409e-02,  7.72565545e-06, -3.49384636e-01,
         2.71743489e-03,  6.98080599e-01]], dtype=float32)
array([[-5.05645233e-02,  3.40539369e-05, -3.85923610e-01,
        -1.20956259e-04,  5.13832636e-01],
       [-2.52256297e-02, -3.12560938e-04, -1.79331742e-01,
         4.03707398e-02, -4.84595768e-01],
       [-3.37178429e-02,  7.72567430e-06, -3.49384624e-01,
         2.71743472e-03,  6.98080669e-01]])

结果一致，整个双向LSTM验证通过，存储和计算的流程验证和我想象中的一样。

keras中双向LSTM流程的验证