为什么要做这件事情?
一、本人NLP水平接近于零
二、本人英语水平接近于零
希望通过这种方式,提升自己。
原文
《A Neural Probabilistic Language Model》
AbstractA goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.
Keywords:Statistical language modeling, artificial neural networks, distributed representation, curse of dimensionality
以下是翻译:
《一种神经概率语言模型》
摘要
统计语言模型的目的是学习一种语言中词汇序列的联合概率函数。其中真正的难点是维度灾难:模型训练期间的词很可能跟要预测词序列的不同。常用但是非常有效的方法是基于n-grams,连接训练集中非常短的重叠序列片段来获取泛化能力。我们提出使用一种词汇的分布式表示来应对维度灾难,这种表示是让每个训练中的句子告诉模型相邻语义序列的一个指数数值。这个模型同时学习关于每个词的表示方式,已经使用这种表示的词组序列概率函数。能够获得泛化能力是因为,只要组成未知句子的单词和训练过的句子的单词在某种程度相似,就可以获得比较高的概率。短时间内训练一个有数百万参数的大规模模型本来就是一个挑战。我们报告的实验是通过神经网络来训练概率函数,在两个文本语料库的结果都表明,所提出的方法有效的改进了最新的n-gram模型的效果,并且可以利用更长的上下文。
关键词:统计语言模型、人工神经网络、分布式表示、维数灾难
多说自己的一点期望:
1.希望学习AI的小伙伴,勇往直前,在困难面前不放弃。
2.也希望学习AI的小伙伴,不要懂一点,就忽悠别人,专注解决问题。
重要声明:如果有错请大家一定要拍砖。一定要。万分谢谢。