Distributed Representations of Sentences and Documents

这篇文章作者是word2vec的作者，主要做的工作是paragraph vector，就是把paragraph变成一个固定长度的向量。其中大量的思路借鉴与CBOW和Skip-gram。

背景知识

CBOW & Skip-gram

模型

Paragraph Vector: A distributed memory model (PV-DM)

这个方法与CBOW类似，首先把paragraph单独赋予一个vector，每一次用这个vector + 前k个单词的vector拼接or平均后去预测下一个词。

PV-DM

该算法分为两个步骤：

使用语料库中的已有paragraph进行训练，得到整个网络的参数，word embedding
对于新的paragraph，则首先随机初始化一个vector，再将网络参数和word embedding固定起来进行训练。

该方法的优势在于可以使用无标签的语料库进行训练。

Paragraph vector without word ordering: Distributed bag of words (PV-DBOW)

PV-DBOW

这个方法简单的说就是paragraph版本的skip-gram。对于整个语料的优化目标为最大化：
$\sum _ { p \in \mathbb { D } } \sum _ { w \in p } \log P ( w |p )$
其中，
$P ( w | p ) = \frac { \exp \left( \boldsymbol { e } ( w ) ^ { \mathrm { T } } \boldsymbol { e } \left( p \right) \right) } { \sum _ { w ^ { \prime } \in \mathrm { V } } \exp \left( \boldsymbol { e }\left( w ^ { \prime } \right) ^ { \mathrm { T } } \boldsymbol { e } \left( p \right) \right) }$

Distributed Representations of Sentences and Documents

背景知识

模型

Paragraph Vector: A distributed memory model (PV-DM)

Paragraph vector without word ordering: Distributed bag of words (PV-DBOW)

推荐阅读更多精彩内容