(2018, IJCAI, Yiping Song et al.)An Ensemble of Retrieval-Based and Generation-Based Human-Comput...

还在修改中。。。

简介

对话系统中，有检索式对话和生成式对话。文章提出一种方法，将检索式对话和生成式对话结合起来。
文章的做法是，对于给出的query，首先通过现成的检索算法从一个大的数据集里检索出相似的多个query；接着，利用这些相似query的reply，和query，生成新的reply；最后，通过一个二分类器，对检索得到的reply、生成的reply进行评估，以得分最高的rely作为输出。
总的来说，文章方法的工作量挺大的，一个reply的输出，需要经过检索、生成、评判三个步骤。

模型/算法

输入：query q
输出：reply r

3个步骤：检索、生成、评判

整体架构图

检索步骤

利用现成的检索算法对一个语料库中进行检索，获取与 $q$ 最相似的 $k$ 个query ${q_1^*,…q_k^*}$ ，以及这k个query对应的reply ${r_1^*,…,r_k^*}$ 。
检索算法用的是 Lucene，是一个现成的全文检索引擎的架构。

生成步骤

根据 $q$ 、和检索步骤得到的 $\{r_1^*,…,r_k^*\}$ ，生成一个reply $\hat{r}^+$

生成reply的模型

$\hat{r}^+$ 的生成是基于一个encoder-decoder结构。
在这里，encoder有 $k+1$ 个，其中 $k$ 个encoder用于对检索步骤得到的 $\{r_1^*,…,r_k^*\}$ 进行编码，剩下的 $1$ 个用于对 $q$ 进行编码。encoder的结构统一采用双向GRU
decoder采用的是单向GRU，对encoder的输出的处理主要分为三部分：sentence-level attention、character-level attention、copy mechanism。
- sentence-level attention：将每个encoder的双向GRU最后一个时间步输出 concat在一起，通过一个线性变换得到一个向量，作为decoder的单向GRU的初始state
- character-level attention：将每个encoder的双向GRU的每个时间步的state与 decoder单向GRU上一个时间步的state 计算 attention，attention作为权重，对每个encoder的双向GRU的每个时间步的state进行加权平均，作为 context参与 decoder单向GRU当前时间步的计算
- copy mechanism：将用于处理 $\{r_1^*,…,r_k^*\}$ 的 $k$ 个encoder的state，与 decoder单向GRU当前时间步的state计算得到一个得分，加到输出的概率中。

评判步骤

对检索步骤得到的 ${r_1^*,…,r_k^*}$ 、生成步骤得到的 $\hat{r}^+$ 进行评分，以得分最高的作为输出 $r$
这里把评分看作一个二分类问题，以分类器得到的概率作为得分。
分类器采用GBDT，分类的类别是 1、reply属于query。2、reply不属于query。
输入的特征有：

Term similarity: query和reply的词重叠率。
Entity similarity: query和reply的实体重叠率。
Topic similarity: 通过Latent Dirichlet Allocation分别得到query、reply的主题概率分布，以二者的cosine距离作为主题相似度。
Statistical Machine Translation: 用一个统计机器翻译模型将 query“翻译”为reply，以每个word pair的翻译得分作为其中一个特征。
Length: reply的长度归一化。
Fluency: reply的bi-grams共现概率。

实验

数据来源于新浪微博、百度贴吧等社交媒体。
在检索步骤，给每个样本检索出 $k=2$ 个query-reply pairs。
在生成步骤，在 dl4mt-tutorial 的基础上编写代码，优化算法采用AdaDelta，batch size 取80。
- 需要请注意的是，检索步骤所用的语料库，与生成步骤训练模型multi-seq2seq的语料库不是同一个。

参与对比的模型有：

Retrieval-1, Retrieval-2：排名第一、第二的检索式对话系统
seq2seq：第一个用于对话生成的encoder-decoder模型
multi-seq2seq^-：生成步骤的模型，只使用了two-level attention，没有使用copy machanism。
multi-seq2seq：生成步骤的模型，使用了two-level attention和copy machanism。
Ensembe(Retrieval-1, Retrieval-2, seq2seq)：检索与seq2seq的集成。
Ensemble(Retrieval-1, Retrieval-2, multi-seq2seq)：检索与multi-seq2seq的集成。

模型测试的结果如下：

Method	Human score	BLEU-1	BLEU-2	BLEU-3	BLEU-4
Retrieval-1	1.013	24.06	10.04	5.232	2.784
Retrieval-2	0.528	4.523	0.655	0.476	0.471
seq2seq	0.880	6.349	0.665	0.111	0.039
Ensemble(retrieval-1, retrieval-2, seq2seq)	1.145	14.15	8.400	7.798	7.619
multi-seq2seq^-	0.918	9.290	2.489	1.144	0.566
multi-seq2seq	0.992	9.609	1.674	0.510	0.191
Ensemble(retrieval-1, retrieval-2, multi-seq2seq)	1.362	16.99	11.13	10.37	9.993

相关工作

检索式对话

Isbell et al. (2000) apply information retrieval techniques to search for related queries and replies. 使用检索技术检索相关query及对应的reply
Ji et al. (2014) and Yan et al. (2016a) use both shallow hand-crafted features and deep neural networks for query-reply matching. 弱人工特征+深度学习
Li et al. (2016b) propose a random walk-style algorithm to rank candidate replies. In addition, their model can incorporate additional content (related entities in the conversation context) by searching a knowledge base when a stalemate occurs during human-computer conversations. 一种排序算法，且能融合外部数据库

生成式对话

Ritter et al. (2011) formulate query-reply transformation as a phrase-based machine translation. 将query到reply 视为机器翻译问题
the renewed prosperity of neural networks witnesses an emerging trend in using RNN for conversation systems ([Sutskever et al., 2014; Vinyals and Le, 2015; Sordoni et al., 2015; Shang et al., 2015; Serban et al., 2016a]). 很多人把RNN用在对话系统上
The prevalent structure is the seq2seq model ([Sutskever et al., 2014]) which comprises of one encoder and one decoder. seq2seq是流行的模型
Li et al. (2016a) propose a mutual information objective in contrast to the conventional maximum likelihood criterion. 提出用互信息作为优化目标。
Mou et al. (2016) and Xing et al. (2016) introduce additional content (i.e., either the most mutually informative word or the topic information) to the reply generator. 将额外的词引入到生成
Serban et al. (2016b) applies a variational encoder to capture query information as a distribution, from which a random vector is sampled for reply generation. 用变分自编码器提取query的信息，采样生成。
He et al.(2017) uses knowledge base for answer generation in question answering task and Libovicky. 用知识库辅助生成
Helcl (2017) investigates different attention strategies in multi-source generation. 探索多源生成

融合式生成

Zoph and Knight (2016) use two RNNs in encoder and one RNN in decoder to translate a sentence into two different languages into another language. 将两种语言翻译成一种语言，应该是为融入多个response作依据。