《Dependency-Based Word Embeddings》阅读笔记

摘要:现在的word embeddings方法都是基于线性上下文的。本文generalize the skip gram model with negative sampling introduced by Mikolov to include arbitrary contexts任意的上下文。

dependency-base embeddings是less topical(局部的),并且more functional similarity than the original skip-gram embeddings


Introduction

过去的方法:将单词表示是分离和不同的symbols,suffers from poor generalization。

本文希望seek a representation that captures semantic and syntactic similarities between words。

过去有方法是基于distributional hypothesis分布假设(Harris 1954),On one end of the spectrum,words are grouped into clusters based on their context(Brown et al.1992, Uszkoreit and Brants, 2008),另一种,高维但是稀疏的向量。 在一些任务中,降低向量的稀疏性,例如SVD或LDA。

最近,使用神经网络语言模型,这些word representation可以称为"neural embeddings"或"word embeddings"。

the state-of-the-art word embedding method是the skip-gram with negative sampling model(SKIPGRAM),使用word2vec软件。

本文,generalize the skip-gram model,将线性上下文转换为arbitrary word contexts。

We experiment with syntactic contexts that are derived from automatically produced dependency parse-trees.


The Skip-Gram Model:

定义:

词和上下文定义

负采样:the negative-sampling objective assumes a dataset D of observed (w, c) pairs of words w and the contexts c。

word-context pair (w, c),这个pair是否来自D。p(D=1 | w, c)是(w,c)来自数据的概率,p(D=0 | w,c) = 1 - p(D=1|w,c)是不存在的概率。分布为:

(w,c) pair来自D的概率分布建模

vw和vc模型要学习的参数。最大化log-probability:

最大化log-probability

如果p(D=1|w,c)=1,设置vc=vw并且vc点乘vw=K,K是足够大的数。为了避免这种解决方法,the objective is extended with (w,c) pairs for which p(D=1|w,c) must be low,pairs没有在D中,构建数据集D‘,(w,c) pairs都是不对的。负采样的训练目标:

负采样

负样本D'可以通过不同的方法构建,Mikolov提出的:对每个(w,c)属于D,构建n个样本(w,c1)...(w,cn),n是hyperparameter,每个cj根据它的unigram distribution raised to the 3/4 power。

Optimizing this objective makes observed word-context pairs have similar embeddings, while scattering unobserved pairs. Intuitively, word that appear in similar context should have similar embeddings, though we have not yet found a formal proof that SKIPGRAM does indeed maximize the dot product of similar words.


Embedding with Arbitrary Contexts:

SKIPGRAM embedding算法,word w的上下文使用surrounding它的,context vocabulary C是和word vocabulary W一样的。然而,上下文不需要与words相关联,context的数目可以远大于word的数目。We generalize SKIPGRAM by replacing the bag-of-words contexts with arbitrary contexts。

本文使用dependency-based syntactic contexts。

1. Linear Bag-of-Words Contexts, target word w附近有大小为k的窗口,上下文大小为2k,如果k是2,那么w的上下文为:w-2, w-1, w+1, w+2。这种线性上下文可能会丢失很多重要的上下文信息。窗口大小为5 is commonly used to capture broad topical content,whereas smaller windows contain more focused information about the target word。

2. Dependency-Based Contexts:首先parsing每个句子,本文:derive word contexts as follows:对于每个target word w with modifiers m1,...,mk and a head h

上下文

lbl是Dependency relation的类型between the head and the modifier(e.g. nsubj, dobj, prep_with, amod)等, lbl-1用于mark the inverse-relation。

Relations that include a preposition are "collapsed" prior to context extraction, by directly connecting the head and the object of the preposition介词。

syntactic dependencies可以捕获距离远的词之间的关联,filter out "coincidnetal"context which are within the window but not directly related to the target word。


实验:

bag-of-word:context with k=5, bag-of-word: context with k=2,DEPS(dependency-based syntactic contexts)

modified word2vec to support arbitrary contexts, and to output the context embeddings in addition to the word embeddings.

The negative-sampling parameter(how many negative contexts to sample for every correct one) was 15.

!!!For DEPS,the corpus was tagged with parts-of-speech using the Stanford tagger(Toutanova et al. 2003) and parsed into labeled Stanford dependencies(de marneffe and Manning, 2008) using an implementation of the parser described in (Goldberg and Nivre, 2012). All tokens were converted to lowercase, and words and contexts that appeared less than 100 times were filtered. 

Qualitative Evaluation:5 most similar words

Quantitative Evaluation:WordSim353 dataset,Chiarello et al. dataset

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 从尧宝三个月开始成为大睡渣开始,其实我就一直在琢磨立规矩这个事儿。 从最初的学习睡渣是为什么,到如何帮助她尽快的过...
    小梦化蝶阅读 2,656评论 5 1
  • 1、会被生动的个案吸引, 2、会固执地捍卫自己的观点, 3、会对自己的未知领域毫无洞察。
    米有人先森阅读 2,985评论 0 0
  • 「没有意义」,一个朋友说她想要把这句话纹在身上。这样,当别人问起这个纹身有什么意义的时候,就可以直接指着这四个字。...
    7nRyan阅读 4,086评论 0 1
  • 2017.11.28 星期二 晴 欣赏自己:已经进入第九天的微商公社21天特种兵内训学习,越来越顺利的进...
    Amy婵chan151021阅读 1,215评论 0 0