本文主要介绍一下RNN的计算规则以及pytorch里面RNN怎么计算的，给自己备注一下。RNN是一种循环神经网络，听起来比较复杂，就知道循环，其他的没了，其实它的里面就是一个数学公式而已，然后是反复的运行就没了。不信的话，下面用代码来试一下。

pytorch RNN说明

关于pytorch RNN API的解释官网有详细介绍，列一下里面内部的公式。
$h_t =tanh(w_{ih}x_t+b_{ih}+w_{hh}h_{(t−1)}+b_{hh})$
这里的 $h_t$ 是在t时刻的隐藏层状态， $x_t$ 是在t时刻的输入， $h_{t-1}$ 是上一时刻的隐藏层状态或者是在时刻0的初始化隐藏层状态，tanh是激活函数，可以通过参数修改成relu

参数
input_size 输入x的特征大小
hidden_size 隐藏层h的特征大小
num_layers 神经元的个数，如果设置为2，第二个神经元的输入是前一个神经元的输出
nonlinearity 激活函数默认为tanh，可以设置为relu
bias 是否设置偏置，默认为True
batch_first 默认为false, 设置为True之后，输入输出为(batch, seq, feature)
dropout 默认为0
bidirectional 默认为False，设置为RNN为双向
返回
input(seq_len, batch, input_size) 如果batch_first输入为input(batch, seq_len, input_size)
h_0(num_layers*num_directions, batch, hidden_size)如果bidirectional为True,num_directions为2

代码

为了简单起见，示例代码输入为batch_size=1，维度为10

num_layers=1, seq_len=2

from torch import nn
import torch
import numpy as np

# input_size, hidden_size, num_layers
rnn = nn.RNN(10, 10, 1)

inputR = torch.randn(2, 1, 10)

h0 = torch.randn(1, 1, 10)

output, hn = rnn(inputR, h0)

输出

output:
tensor([[[-0.4582,  0.3975,  0.7992,  0.2567,  0.5510,  0.4386, -0.6069,
          -0.2433, -0.0597,  0.2545]],

        [[ 0.2327,  0.2221, -0.1225,  0.1365,  0.1384,  0.7557,  0.9028,
          -0.4454,  0.1529,  0.0789]]], grad_fn=<StackBackward>)

hn:
tensor([[[ 0.2327,  0.2221, -0.1225,  0.1365,  0.1384,  0.7557,  0.9028,
          -0.4454,  0.1529,  0.0789]]], grad_fn=<StackBackward>)

公式运行, 第一个seq

ih =  rnn.weight_ih_l0.data.mm(inputR[0].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)

hh = rnn.weight_hh_l0.data.mm(h0[0].squeeze().view(10, 1)) + rnn.bias_hh_l0.data.view(10,1)

temp = torch.tanh(ih+hh)
temp

输出

tensor([[-0.4582],
        [ 0.3975],
        [ 0.7992],
        [ 0.2567],
        [ 0.5510],
        [ 0.4386],
        [-0.6069],
        [-0.2433],
        [-0.0597],
        [ 0.2545]])

第二个seq

ih =  rnn.weight_ih_l0.data.mm(inputR[1].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)

hh = rnn.weight_hh_l0.data.mm(temp) + rnn.bias_hh_l0.data.view(10,1)

temp = torch.tanh(ih+hh)
temp

输出

tensor([[ 0.2327],
        [ 0.2221],
        [-0.1225],
        [ 0.1365],
        [ 0.1384],
        [ 0.7557],
        [ 0.9028],
        [-0.4454],
        [ 0.1529],
        [ 0.0789]])

完全一样

num_layers=2, seq_len=1

from torch import nn
import torch
import numpy as np

# input_size, hidden_size, num_layers
rnn = nn.RNN(10, 10, 2)

inputR = torch.randn(1, 1, 10)

h0 = torch.randn(2, 1, 10)

output, hn = rnn(inputR, h0)

输出

output:
tensor([[[-0.6109,  0.1926,  0.7245, -0.4304, -0.2992,  0.0129, -0.1721,
           0.6340, -0.3601, -0.3554]]], grad_fn=<StackBackward>)

hn:
tensor([[[ 0.0410,  0.2077, -0.6816,  0.0125,  0.3604, -0.4399,  0.7102,
          -0.0217,  0.8443, -0.1684]],

        [[-0.6109,  0.1926,  0.7245, -0.4304, -0.2992,  0.0129, -0.1721,
           0.6340, -0.3601, -0.3554]]], grad_fn=<StackBackward>)

接下来使用公式运行一遍，和上面的结果一样，更好理解RNN

ih =  rnn.weight_ih_l0.data.mm(inputR[0].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)

hh = rnn.weight_hh_l0.data.mm(h0[0].squeeze().view(10, 1)) + rnn.bias_hh_l0.data.view(10,1)

temp = torch.tanh(ih+hh)

输出

temp:
tensor([[ 0.0410],
        [ 0.2077],
        [-0.6816],
        [ 0.0125],
        [ 0.3604],
        [-0.4399],
        [ 0.7102],
        [-0.0217],
        [ 0.8443],
        [-0.1684]])

可以看到和hn的一部分数据完全一样，因为设置num_layers为2，这个时候才经过第一个RNN神经元，接下来经过第二个神经元

ih1 =  rnn.weight_ih_l1.data.mm(temp.data) + rnn.bias_ih_l1.data.view(10,1)

hh1 = rnn.weight_hh_l1.data.mm(h0[1].squeeze().view(10, 1)) + rnn.bias_hh_l1.data.view(10,1)

torch.tanh(ih1+hh1)

输出

tensor([[-0.6109],
        [ 0.1926],
        [ 0.7245],
        [-0.4304],
        [-0.2992],
        [ 0.0129],
        [-0.1721],
        [ 0.6340],
        [-0.3601],
        [-0.3554]])

可以看到和output输出一样，和hn的第二部分数据完全一样。

总结

通过上面的代码运行一遍，应该就能看出来RNN内部如何运行的了

好的文档关于RNN的
The Unreasonable Effectiveness of Recurrent Neural Networks
Understanding LSTM Networks
Attention and Augmented Recurrent Neural Networks

pytorch RNN的一点理解

pytorch RNN的一点理解

pytorch RNN说明

代码

num_layers=1, seq_len=2

num_layers=2, seq_len=1

总结