http://www.aclweb.org/website/old_anthology/W/W14/W14-07.pdf#page=58
抽取v-np的因果对
verb-noun phrase
用了FrameNet里面的数据
dependency parse
Purpose, Internal cause, Result, External cause, Cause, Reason, Explanation, Required situation, Purpose of Event, Negative consequences, resulting action, Effect, Cause of shine, Purpose of Goods, Response action, Enabled situation, Grinding cause, Trigger
有以上label的认为是cause
其他label是non-cause
被标注的区间应该没有动词
一共搞到2158的cause,65777的non-cause
Supervised learning
Features:
- Lexical: verb, lemma of verb, noun phrase, lemma of all words of noun phrase, head noun of noun phrase, lemmas of all words between verb and head noun of noun phrase. 关于单词本身
- Semantic: 9 noun hierarchies of WordNet(entity, psychological feature, abstraction, state, event, act, group, possession, phenomenon) 关于单词的类别
- Structural: for a v-np pair the variable sub in np is set to 1 if the subject of v is contained in np, set to 0 if the subject of v is not contained in np and set to -1 if the subject of v is not available in the instance. 关于句子的构成
x1(v-np, l) is the decision vari- able set to 1 only if the label l ∈ L1 is assigned to v-np.
x1(.)是一个取值为0或1的函数,当v-np的label为l时为1。l可取cause或non-cause
(2)式规定,对于任意v-np只能取一个标签
P是概率函数
后面又添加了关于name entity的东西
assume if a noun phrase is identified as a named entity then its corresponding verb-noun phrase pair encodes non-cause relation
这会造成false negative
然后又定义了几个pattern(by, from, because of, through, for),如果一个noun name entity 和一个verb被这些pattern分割,则不用上面的规则。
we identify the semantic classes of noun phrases which do not normally represent events, conditions, states, phenomena, processes and thus have high tendency to encode non-cause relations.
对noun phrase继续加了一个分类器,判断这个noun phrase是不是可能encode causality
又加了一些rule
最后从wiki上找了一些句子,做了pos-tag和dependency parse, 一共1000多个instance来做测试