循环神经网络pytorch实现

RNN

RNN

前向过程:

  • h_t = g(Uh_{t-1} + Wx_t +b_h)
  • y_t = g(W_yh_t + b_y)

pytorch 实现

import torch
import torch.nn as nn
import torch.nn.functional as F


class RNNCell(nn.Module):

    def __init__(self, input_size, hidden_dim):
        super(RNNCell, self).__init__()
        self.input_size = input_size
        self.hidden_dim = hidden_dim
        self.linear1 = nn.Linear(hidden_dim, hidden_dim)
        self.linear2 = nn.Linear(input_size, hidden_dim)

    def forward(self, x, h_pre):
        """
        :param x:       (batch, input_size)
        :param h_pre:   (batch, hidden_dim)
        :return: h_next (batch, hidden_dim)
        """
        h_next = torch.tanh(self.linear1(h_pre) + self.linear2(x))
        return h_next


class RNN(nn.Module):

    def __init__(self, input_size, hidden_dim):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.hidden_dim = hidden_dim
        self.rnn_cell = RNNCell(input_size, hidden_dim)

    def forward(self, x):
        """
        :param x: (seq_len, batch,input_size)
        :return:
           output (seq_len, batch, hidden_dim)
           h_n    (1, batch, hidden_dim)
        """
        seq_len, batch, _ = x.shape
        h = torch.zeros(batch, self.hidden_dim)
        output = torch.zeros(seq_len, batch, self.hidden_dim)
        for i in range(seq_len):
            inp = x[i, :, :]
            h = self.rnn_cell(inp, h)
            output[i, :, :] = h

        h_n = output[-1:, :, :]
        return output, h_n

LSTM

LSTM

前向过程:

  • 输入门: i_t = \sigma (W_ix_t + U_ih_{t-1} + b_i)
  • 遗忘门: f_t = \sigma (W_fx_t + U_fh_{t-1} + b_f)
  • 输出门: o_t = \sigma (W_ox_t + U_oh_{t-1} + b_o)
  • \hat{c}_t = tanh(W_cx_t + U_ch_{t-1} + b_c)
  • c_t = f_t \odot c_{t-1} + i_t \odot \hat{c} _t
  • h_t = o_t \odot tanh(c_t)

pytorch 实现

import torch
import torch.nn as nn
import torch.nn.functional as F
import copy


class Gate(nn.Module):
    def __init__(self, input_size, hidden_dim):
        super(Gate, self).__init__()
        self.linear1 = nn.Linear(hidden_dim, hidden_dim)
        self.linear2 = nn.Linear(input_size, hidden_dim)

    def forward(self, x, h_pre, active_func):
        h_next = active_func(self.linear1(h_pre) + self.linear2(x))
        return h_next


def clones(module, N):
    "Produce N identical layers."
    return nn.ModuleList([copy.deepcopy(module) for _ in range(N)])


class LSTMCell(nn.Module):

    def __init__(self, input_size, hidden_dim):
        super(LSTMCell, self).__init__()
        self.input_size = input_size
        self.hidden_dim = hidden_dim
        self.gate = clones(Gate(input_size, hidden_dim), 4)

    def forward(self, x, h_pre, c_pre):
        """
        :param x: (batch, input_size)
        :param h_pre: (batch, hidden_dim)
        :param c_pre: (batch, hidden_dim)
        :return: h_next(batch, hidden_dim), c_next(batch, hidden_dim)
        """
        f_t = self.gate[0](x, h_pre, torch.sigmoid)
        i_t = self.gate[1](x, h_pre, torch.sigmoid)
        g_t = self.gate[2](x, h_pre, torch.tanh)
        o_t = self.gate[3](x, h_pre, torch.sigmoid)
        c_next = f_t * c_pre + i_t * g_t
        h_next = o_t * torch.tanh(c_next)

        return h_next, c_next


class LSTM(nn.Module):

    def __init__(self, input_size, hidden_dim):
        super(LSTM, self).__init__()
        self.input_size = input_size
        self.hidden_dim = hidden_dim
        self.lstm_cell = LSTMCell(input_size, hidden_dim)

    def forward(self, x):
        """
        :param x: (seq_len, batch,input_size)
        :return:
           output (seq_len, batch, hidden_dim)
           h_n    (1, batch, hidden_dim)
           c_n    (1, batch, hidden_dim)
        """
        seq_len, batch, _ = x.shape
        h = torch.zeros(batch, self.hidden_dim)
        c = torch.zeros(batch, self.hidden_dim)
        output = torch.zeros(seq_len, batch, self.hidden_dim)
        for i in range(seq_len):
            inp = x[i, :, :]
            h, c = self.lstm_cell(inp, h, c)
            output[i, :, :] = h

        h_n = output[-1:, :, :]
        return output, (h_n, c.unsqueeze(0))

GRU

GRU

前向过程:

更新门:

  • r_t = \sigma (W_{xr}x_t + W_{hr}h_{t-1} + b_r)
  • z_t = \sigma (W_{xz}x_t + W_{hz}h_{t-1} + b_z)

候选隐含状态:

  • \hat{h}_t = tanh(W_{xh}x_t + r_t \odot W_{hh}h_{t-1} + b_h)

隐含状态:

  • h_t = z_t \odot h_{t-1} + (1-z_t) \odot \hat{h}_t

输出:

  • y_t = softmax(W_{hy}h_t + b_y)
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • 花书中关于RNN的内容记录于https://www.jianshu.com/p/206090600f13。 在前馈...
    单调不减阅读 5,065评论 0 2
  • 一、基本原理 循环神经网络(recurrent neural network, RNN)是一类具有短期记忆能力的神...
    fromeast阅读 1,049评论 0 8
  • 本课重点: RNN的概念与多种形式 语言模型 图像标注、视觉问答、注意力模型 RNN梯度流 1 RNN的概念与多种...
    HRain阅读 2,351评论 0 4
  • 文|识花蜻蜓 花开一季,人生百年。春来秋往,悲喜无常。 有些人的悲,源于侥幸心理;有些人的悲,纯属咎由自取;有些人...
    识花蜻蜓阅读 1,063评论 18 15
  • 最近一段时间,我进入了运动倦怠期,仔细推算,大概是从五一放假开始的。 五月气温骤升,天气变热,特别是下午,走在路上...
    孙小青阅读 480评论 1 1

友情链接更多精彩内容