RNN教程1--概念

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many NLP tasks. But despite their recent popularity I’ve only found a limited number of resources that throughly explain how RNNs work, and how to implement them. That’s what this tutorial is about. It’s a multi-part series in which I’m planning to cover the following:

RNN在NLP领域应用广泛,但是很少有文章能够透彻的讲解其原理,更别提用精炼的代码实现出来。这篇教程就是干这个的,因为内容比较多,分成四个部分。

  1. Introduction to RNNs (this post)
  2. Implementing a RNN using Python and Theano
  3. Understanding the Backpropagation Through Time (BPTT) algorithm and the vanishing gradient problem
  4. Implementing a GRU/LSTM RNN

As part of the tutorial we will implement a recurrent neural network based language model. 我们要用RNN实现一个语言模型.
The applications of language models are two-fold: First, it allows us to score arbitrary sentences based on how likely they are to occur in the real world. This gives us a measure of grammatical and semantic correctness. Such models are typically used as part of Machine Translation systems. 这种模型的典型用途之一是机器翻译。Secondly, a language model allows us to generate new text (I think that’s the much cooler application). Training a language model on Shakespeare allows us to generate Shakespeare-like text. 另一种用途是生成文章,比如机器看了莎士比亚全集,然后模仿写出一篇模仿莎士比亚的文章。This fun post by Andrej Karpathy demonstrates what character-level language models based on RNNs are capable of.

I'm assuming that you are somewhat familiar with basic Neural Networks. If you’re not, you may want to head over to Implementing A Neural Network From Scratch, which guides you through the ideas and implementation behind non-recurrent networks.
这篇文章要求读者有基本的NN知识。

What are RNNs?
问题来了,什么是RNN?

The idea behind RNNs is to make use of sequential information. RNN的思想是利用信息的连续性。In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. 原始的NN模型利用的是信息的独立性,但是在很多方面受限,比如要预测句子缺的一个单词,最好要先知道句子的结构或者前一个单词。RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps (more on this later). RNN实现的是一种回头看的机制,也就是具有记忆功能,我要预测下一个单词,我要了解这个句子。Here is what a typical RNN looks like:

rnn
rnn

A recurrent neural network and the unfolding in time of the computation involved in its forward computation. Source: Nature
图中表示了一个RNN 神经元的经典表述。

The above diagram shows a RNN being unrolled (or unfolded) into a full network. By unrolling we simply mean that we write out the network for the complete sequence. 上图展示了展开的RNN,所谓展开,意思是说整个网络的序列性被显示出来。比如说,如果我们的句子里边有五个单词,这个序列会被展开成一个五层的网络 。For example, if the sequence we care about is a sentence of 5 words, the network would be unrolled into a 5-layer neural network, one layer for each word. The formulas that govern the computation happening in a RNN are as follows:

xt is the input at time step t. For example, could be a one-hot vector corresponding to the second word of a sentence. Xt 是t关于时间的输入,比如可以是一个词向量。
St is the hidden state at time step t. It's the “memory” of the network. St is calculated based on the previous hidden state and the input at the current step: . St 是在t的隐藏状态,计算它要以来当前的输入和上一个隐藏状态的值。The function f usually is a nonlinearity such as tanh or ReLU. S-1, which is required to calculate the first hidden state, is typically initialized to all zeroes.跟普通NN一样,激活函数是非线性的,S-1隐藏状态被初始化为0。
Ot is the output at step t. For example, if we wanted to predict the next word in a sentence it would be a vector of probabilities across our vocabulary. .Ot是输出,比如要预测的单词的输出就是单词表各个单词的概率

There are a few things to note here:

  • You can think of the hidden state 'St' as the memory of the network. St captures information about what happened in all the previous time steps. St 可以理解为网络的存储器,它保存了网络之前的所以信息。The output Ot at step t is calculated solely based on the memory at time t. As briefly mentioned above, it’s a bit more complicated in practice because typically St can’t capture information from too many time steps ago.Ot 只依赖t时刻的St, 但是实践中,St想反应所有之前的状态信息是很困难的,因为需要向回看的步骤太多了。

  • Unlike a traditional deep neural network, which uses different parameters at each layer, a RNN shares the same parameters ( above) across all steps. This reflects the fact that we are performing the same task at each step, just with different inputs. This greatly reduces the total number of parameters we need to learn.之前的NN需要每层都都不同的算法,参数也各异,但是RNN的每一层只需要相同的参数。这也就意味着需要的参数很少,每一步的输入不同,但是作用相同。

  • The above diagram has outputs at each time step, but depending on the task this may not be necessary. For example, when predicting the sentiment of a sentence we may only care about the final output, not the sentiment after each word. 并不是每一步都需要输出的,模型可以是一对多,多对多。同样的也不是每一步都需要输入。 Similarly, we may not need inputs at each time step. The main feature of an RNN is its hidden state, which captures some information about a sequence.现在就可以看出RNN的本质特性就在于隐藏状态可以反映序列的信息。比如,时间

What can RNNs do?

RNNs have shown great success in many NLP tasks. At this point I should mention that the most commonly used type of RNNs are LSTMs, which are much better at capturing long-term dependencies than vanilla RNNs are. But don’t worry, LSTMs are essentially the same thing as the RNN we will develop in this tutorial, they just have a different way of computing the hidden state. We’ll cover LSTMs in more detail in a later post. Here are some example applications of RNNs in NLP (by non means an exhaustive list).下面介绍NLP领域的应用。

Language Modeling and Generating Text

Given a sequence of words we want to predict the probability of each word given the previous words. Language Models allow us to measure how likely a sentence is, which is an important input for Machine Translation (since high-probability sentences are typically correct). A side-effect of being able to predict the next word is that we get a generative model, which allows us to generate new text by sampling from the output probabilities. And depending on what our training data is we can generate all kinds of stuff. In Language Modeling our input is typically a sequence of words (encoded as one-hot vectors for example), and our output is the sequence of predicted words. When training the network we set since we want the output at step to be the actual next word.

Research papers about Language Modeling and Generating Text:

Recurrent neural network based language model
Extensions of Recurrent neural network based language model
Generating Text with Recurrent Neural Networks

Machine Translation

Machine Translation is similar to language modeling in that our input is a sequence of words in our source language (e.g. German). We want to output a sequence of words in our target language (e.g. English). A key difference is that our output only starts after we have seen the complete input, because the first word of our translated sentences may require information captured from the complete input sequence.

RNN for Machine Translation. Image Source: http://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf
Research papers about Machine Translation:

A Recursive Recurrent Neural Network for Statistical Machine Translation
Sequence to Sequence Learning with Neural Networks
Joint Language and Translation Modeling with Recurrent Neural Networks

Speech Recognition

Given an input sequence of acoustic signals from a sound wave, we can predict a sequence of phonetic segments together with their probabilities.

Research papers about Speech Recognition:

Towards End-to-End Speech Recognition with Recurrent Neural Networks

Generating Image Descriptions

Together with convolutional Neural Networks, RNNs have been used as part of a model to generate descriptions for unlabeled images. It’s quite amazing how well this seems to work. The combined model even aligns the generated words with features found in the images.

Deep Visual-Semantic Alignments for Generating Image Descriptions. Source: http://cs.stanford.edu/people/karpathy/deepimagesent/

Training RNNs

Training a RNN is similar to training a traditional Neural Network. We also use the backpropagation algorithm, but with a little twist. 都用BP训练Because the parameters are shared by all time steps in the network, the gradient at each output depends not only on the calculations of the current time step, but also the previous time steps.但是RNN的梯度不只依赖输出,还依赖上一个状态。 For example, in order to calculate the gradient at t = 4 we would need to backpropagate 3 steps and sum up the gradients. 比如t4时刻的梯度等于前三步bp的累加。 This is called Backpropagation Through Time (BPTT). If this doesn’t make a whole lot of sense yet, don’t worry, we’ll have a whole post on the gory details. For now, just be aware of the fact that vanilla RNNs trained with BPTT have difficulties learning long-term dependencies (e.g. dependencies between steps that are far apart) due to what is called the vanishing/exploding gradient problem. 长期训练会有梯度消失,梯度爆炸等问题There exists some machinery to deal with these problems, and certain types of RNNs (like LSTMs) were specifically designed to get around them.用不同的网络类型解决这个问题,比较出名的是LTSM

RNN Extensions

Over the years researchers have developed more sophisticated types of RNNs to deal with some of the shortcomings of the vanilla RNN model. We will cover them in more detail in a later post, but I want this section to serve as a brief overview so that you are familiar with the taxonomy of models.

Bidirectional RNNs are based on the idea that the output at time may not only depend on the previous elements in the sequence, but also future elements. 双向RNN解决未来依赖问题 For example, to predict a missing word in a sequence you want to look at both the left and the right context. 比如填词游戏,不仅要看左边还要看右边 Bidirectional RNNs are quite simple. They are just two RNNs stacked on top of each other. The output is then computed based on the hidden state of both RNNs.

Deep (Bidirectional) RNNs are similar to Bidirectional RNNs, only that we now have multiple layers per time step. In practice this gives us a higher learning capacity (but we also need a lot of training data).深度双向RNN每一步都有多个层,因此容量很大

LSTM networks are quite popular these days and we briefly talked about them above. LSTMs don’t have a fundamentally different architecture from RNNs, but they use a different function to compute the hidden state.LSTM是目前最流行的一种了,但是唯一的差别是计算隐藏层状态的函数不一样。 The memory in LSTMs are called cells and you can think of them as black boxes that take as input the previous state and current inputLSTM的memory叫做cell,cell负责处理前面的状态和输入 . Internally these cells decide what to keep in (and what to erase from) memorycell内部自己决定是从往内存记录还说擦除信息. They then combine the previous state, the current memory, and the inputcell会合并前面的状态,当前的内存和输入。. It turns out that these types of units are very efficient at capturing long-term dependencies. 事实证明LSTM在记录长期依赖信息时很有效 LSTMs can be quite confusing in the beginning but if you’re interested in learning more this post has an excellent explanation.刚开始理解LSTM比较困难,但是会越来越有趣。

Conclusion

So far so good. I hope you’ve gotten a basic understanding of what RNNs are and what they can do. In the next post we’ll implement a first version of our language model RNN using Python and Theano. Please leave questions in the comments!
这篇文章主要是背景介绍,读起来没啥意思。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,921评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,635评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,393评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,836评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,833评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,685评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,043评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,694评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,671评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,670评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,779评论 1 332
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,424评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,027评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,984评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,214评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,108评论 2 351
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,517评论 2 343

推荐阅读更多精彩内容

  • 一直以来 一直以来,总感觉自己被一种莫名的东西束缚着。 就像一个人躺在一张大而舒适的床上,身体却是永远蜷缩的。 拥...
    Q蕾阅读 298评论 0 0
  • 袅袅清荷立水中 雨后初晴 一碧长空 谁家池下绿波兴 群蝶悠飞 遍吻香红 此景思来与梦同 岁岁蹉跎 怎诉情衷 ...
    繁花落尽深眸阅读 239评论 5 10
  • 身为北方人,对面条有着别样的感觉。远在他乡,回家以后,吃上母亲做的一碗手擀面,才能感觉到家的温暖。 高...
    输人不输阵阅读 526评论 1 1