创建一个简单的LSTM
在tensorflow中通过一句简单的命令就可以实现一个完整的LSTM结构。
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)
将LSTM中的初始状态初始化全0数组使用.zero_state
函数
state = lstm.zero_state(batch_size,tf.float32)
for i in range(num_steps):
output,state = lstm.call(input,state)
创建多层的LSTM
创建深层的循环神经网络,同样可以使用 zero_state
进行初始化。
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(lstm_size) for _ in range(number_of_layers])
state = stacked_lstm.zero_state(batch_size,tf.float32)1
LSTM中使用Dropout
tf.nn.rnn_cell.DropoutWrapper(
cell,
input_keep_prob=1.0,
output_keep_prob=1.0,
state_keep_prob=1.0,
variational_recurrent=False,
input_size=None,
dtype=None,
seed=None,
dropout_state_filter_visitor=None
)
tf.nn.rnn_cell.BasicLSTMCell
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([tf.nn.rnn_cell.DropoutWrapper(lstm_cell(lstm_size)) for _ in range(number_of_layers])
BiLSTM
tf.nn.bidirectional_dynamic_rnn(
cell_fw,
cell_bw,
inputs,
initial_state_fw=None,
initial_state_bw=None,
sequence_length=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)
输出 (outputs, output_states) :
-
outputs
: 输出是 time_steps 步里所有的输出, 它是一个元组(output_fw, output_bw)
包含了前向和后向的输出结果,每一个结果的形状为[batch_size, max_time, cell_fw.output_size]
It returns a tuple instead of a single concatenated Tensor. If the concatenated one is preferred, the forward and backward outputs can be concatenated as tf.concat(outputs, 2) -
output_states
:是一个元组(output_state_fw, output_state_bw)
,包含前向和后向的最后一步的状态。
dynamic_rnn
- 使用dynamic_rnn的时候每个batch的最大序列长度不需要相同,第一个batch的维度可以是2 * 4,第二个batch的维度是2 * 7,在训练的时候dynamic_rnn会根据每个batch的最大长度动态的展开到需要的层数,这就是它被称为dynamic的原因。
可以看到虽然上面的LSTM可以是一个batch的输入,但是每次运算LSTM的时候只能在时间步上前进一步,相当于PyTorch中的LSTMCell,那么什么函数相当于PyTorch中的LSTM呢,答案是tf.nn.dynamic_rnn
tf.nn.dynamic_rnn(cell, inputs,
initial_state=None,
sequence_length=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)
输入参数:
-
cell
: 一个 RNNCell 实例对象 -
inputs
: RNN 的输入序列 -
initial_state
: RNN 的初始状态, If cell.state_size is an integer, this must be a Tensor of appropriate type and shape [batch_size, cell.state_size]. If cell.state_size is a tuple, this should be a tuple of tensors having shapes [batch_size, s] for s in cell.state_size. -
sequence_length
: 形状为[batch_size]
, 其中的每一个值为 sequence length(即 time_steps), eg:sequence_length=tf.fill([batch_size], time_steps) -
time_major
: 默认为 False,输入和输出张量的形状为 [batch_size, max_time, depth];当取 True 的时候, it avoids transposes at the beginning and end of the RNN calculation,输入和输出张量的形状为 [max_time, batch_size, depth] -
scope
: VariableScope for the created subgraph; defaults to “rnn”.
输出:
-
outputs
:是 time_steps 步里所有的输出,形状为[batch_size, max_time, cell.output_size]
-
state
:是最后一步的隐状态,形状为[batch_size, cell.state_size]