论文阅读—ECPE-2D: Emotion-Cause Pair Extraction based on Joint Two-Dimensional Representation, Intera...

发表：ACL 2021 CCF-A
引用：Ding Z, Xia R, Yu J. ECPE-2D: Emotion-cause pair extraction based on joint two-dimensional representation, interaction and prediction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3161-3170.
论文地址：https://aclanthology.org/2020.acl-main.288.pdf
本篇文章的作者同时是 ECPE 任务的作者，基于原有的方法进行了改进。

1 背景与问题

文本情感分析领域的子任务，称为情感-原因对提取 (ECPE)。它旨在提取文档中潜在的情绪对及其相应的原因。为了解决这个问题，现有研究采用了两步框架，该框架首先提取单个情感集和原因集，然后对相应的情感和原因进行配对。
但以上的解决办法存在两个问题：

建模并不直接提取最终的情感-原因对，在原因提取之前必须手动注释情感，这极大地限制了其实际应用；
第一步的错误会影响第二步的性能，首先注释情感然后提取原因的方法忽略了情绪和原因相互存在依赖关系。

ECPE被定义为细粒度的情感分析任务，其目标是在给定由多个子句组成的文档作为输入的情况下提取一组有效的情感-原因对。图 1 (a) 显示了 ECPE 任务的示例。
此示例由六个子句组成的文档。第 c4 句包含“happy”的情绪，有两个对应的原因：子句 c2（“a policeman visited the old man with the lost money”），子句 c3（“told him that the thief was caught”）。第 c5 条包含“worried”的情绪，相应的原因是子句 c6（“as he doesn't know how to keep so much money”）。最终输出是在子句级别定义的一组有效的情感-原因对：{c4-c2, c4-c3, c5-c6}。
ECPE-2Steps 包含两步骤：
步骤 1 单独提取情感集和原因集。例如在图1 (a)中，情感集为{c4, c5}，原因集为{c2, c3, c6}；
步骤2根据步骤1的输出进行情感-原因配对和过滤。如图1 (a)所示，首先将笛卡尔积应用于情感集，得到候选情感-原因对，然后训练一个独立的过滤器来去除无效对。

Figure 1 两者方法对比

ECPE-2Steps 方法存在的缺点：

作为两个独立步骤的管道，ECPE-2Steps 需要两个预测步骤来获得最终的情感-原因对。该模型的训练并不直接旨在提取最终的情感-原因对。
步骤1中的错误会影响步骤2的性能。

2 方法

为了解决这些问题，本文提出了一种新的端到端 ECPE 解决方案，称为 ECPE-Two-dimension (ECPE-2D)，通过 2D representation scheme 来表示情感-原因对，并将情感-原因对表示、交互和预测集成到联合框架中。如图 1 (b) 所示。
用 2D representation scheme 来表示方阵形式的情感-原因对，其中每个项目代表一个情感-原因对。其次，进一步提出了一个 2D Transformer 框架及其两个变体，window-constrained and cross-road 2D transformers，以捕获不同情绪原因对之间的交互。最后，我们通过对每种情绪-原因对进行二元分类，基于 2D representation 提取有效的情绪-原因对。这三个部分被集成到一个统一的框架中并同时训练。
该方法的总体架构如图 2 所示。它由三部分组成：

2D Emotion-Cause Pair Representation;
2D Emotion-Cause Pair Interaction;
2D EmotionCause Pair Prediction.

首先，首先使用单个 emotion/cause encoding 组件来获得emotion-specific representation vectors 和 cause-specific representation vectors。应用完整的配对将两个表示向量配对为 2D representation matrix。然后提出了一个 2D transformer module 来模拟不同情绪原因对之间的交互。对于矩阵中的每个情感-原因对，更新后的表示最终被馈送到 softmax 层以预测该对是否有效。这三个模块被集成到一个统一的框架中并同时训练。

Figure 2 整体架构

2.1 2D Emotion-Cause Pair Representation

1 Individual Emotion/Cause Encoding

Clause encoder layer 是为文档中的每个子句生成一个特定于情感的表示和一个特定于原因的表示。输入是文档包含多个子句： $d = [c_1, c_2, · · · , c_{|d|}]$ ，每个子句还包含多个单词 $c_i = [w_{i,1}, w_{i,2},..., w_{i,|ci|}]$ 。
紧接着接入 Bi-LSTM 模块组成。使用注意力机制来获得子句表示 si。
上层由两个独立的组件组成，目标是分别为每个子句生成特定于情感的表示 $r^{emo}_i$ 和特定于原因的表示 $r^{cau}_i$ 。两个组件都将子句表示 $(s_1, s_2, , s_{|d|})$ 作为输入，并使用两个子句级 BiLSTM 分别获得 $r^{emo}_i$ 和 $r^{cau}_i$ 。最后，将 remoi 和 rcaci 分别馈入两个 softmax 层以获得情绪预测 $\hat{y}^{emo}_{i}$ 并导致预测 $\hat{y}^{cau}_{i}$ 。即图二中的绿色框部分。

2 Emotion-Cause Full Pairing

将 d 中所有可能的子句对视为候选。假设文档长度为 |d|，那么所有可能的对形成一个形状为 |d|∗|d| 的矩阵 M，其中行和列分别表示情感子句的索引和文档中的原因子句。
$M_{i, j} = r^{emo}_{i} \oplus \hat{y}^{emo}_{i} \oplus {r}^{cau}_{j} \oplus \hat{y}^{cau}_{j} \oplus {rpe}_{i,j}$ 如图 2 中右侧灰色矩阵部分。

2.2 2D Emotion-Cause Pair Interaction

1 Standard 2D Transformer

The standard 2D transformer (Vaswani et al., 2017) 由 N 层的堆栈组成。每一层由两个子层组成：
Multi-head 2D self-attention mechanism
Position-wise Feed Forward Network
以上两部分在 attention 中专门介绍的。
Computational inefficiency
由于 standard transformer的输出是|d|∗|d|元素，每个元素需要计算|d|∗|d|注意力权重，最终(|d|∗|d|)∗(|d|∗|d|)权重需要计算和临时存储。为了缓解计算负载，故提出 Transformer 的两种变体：

window-constrained 2D Transformer
cross-road 2D 2D Transformer，如图 3 所示。

Figure 3

Window-constrained 2D Transformer

条件：假设大多数原因子句都在情感子句周围。
而只需要满足 j - i ∈[−window, window] 的 $c^{emo}_i - c^{cau}_j$ 作为输入。
优点：1 大大降低了资源需求；2 缓解了类不平衡问题。

Cross-road 2D Transformer

假设：同一行或列中对的特征表示往往更接近。

我们在图 4 (a)、(b) 和 (c) 中分别显示了 standard, window-constrained, and crossroad 2D transformer 以及总和计算注意力的示例。

Figure 4

2.3 2D Emotion-Cause Pair Prediction

经过 N 个 2D transformer layers 转换之后，获取每对 $c^{emo}_i -c^{cau}_j$ 的最终表示 $o^{(N)}_{i,j}$ ，进行原因对预测 $\hat{y}^{pair}_{i,j}$ 。
为了获得更好的情感特定表示和特定原因的表示，本文引入了情感预测和原因预测的辅助损失，文档 d 的模型的最终损失是具有 L2 正则化项的辅助损失的加权和。

实验

实验结果如表 2 数据展示。

Table 2

实验涉及的主要概念如下：
ECPE-2Steps

Indep: It is an independent multi-task learning method, in which emotion extraction and cause extraction are independently modeled.
Inter-CE: It is an interactive multi-task learning method, in which the predictions of cause extraction are used to improve emotion extraction.
Inter-EC: It is another interactive multi-task learning method, in which the predictions of emotion extraction are used to enhance cause extraction.

ECPE-2D

"-" indicates that we do not introduce emotioncause pair interaction;
"+WC" indicates that we use the windowconstrained 2D transformer for emotion-cause pair interaction;
"+CR" indicates that we use the cross-road 2D transformer for emotion-cause pair interaction;

ECPE-2D vs. ECPE-2Steps
The Effectiveness of Auxiliary Supervision

Table 3

贡献

在本文中，我们提出了a joint endto-end framework, called ECPE-2D，而不是 two-step pipeline ，通过 2D representation scheme 来表示情感-原因对，并将2D情感-原因对 representation, interaction, 和 prediction集成到一个联合框架中。
我们还开发了两种 2D Transformer，即 Window-constrained 和 Cross-road 2D Transformer，以进一步对不同情感原因对的交互进行建模。

基准情感原因语料库上的实验结果表明，除了联合建模的优势外，本文的方法在 ECPE 任务的 F1 分数方面比最先进的方法高出 7.6 个百分点。