Adversarial Multi-Criteria Learning for Chinese Word Segmentation

paper: https://arxiv.org/pdf/1704.07556.pdf

code:  https://github.com/FudanNLP

title

Abstract

中文分词(CWS)有很多不同的分词标准criterion,这篇文章就是想要利用对抗学习,提取多种不同的标准中的共享知识。

In this paper, we propose adversarial multi-criteria learning for CWS by integrating shared knowledge from multiple heterogeneous segmentation criteria.

以前也有类似利用多个corpora的方法,不过大多都只是利用linear classifier with discrete features。这篇文章其实就是一个multi-task任务,他把每个分词标准当作一个task,然后有三个不同的share-private models:shared / private layer,提取与标准无关/相关的特征。用对抗的方法确保共享层提取common underlying and criteria-invariant features。

The contributions of this paper could be summarized as follows.

• Multi-criteria learning is first introduced for CWS, in which we propose three shared-private models to integrate multiple segmentation criteria.

• An adversarial strategy is used to force the shared layer to learn criteria-invariant features, in which a new objective function is also proposed instead of the original cross-entropy loss.

• We conduct extensive experiments on eight CWS corpora with different segmentation criteria, which is by far the largest number of datasets used simultaneously.


Methods

对每个字符标记 {B, M, E, S} (begin, middle, end, single)。普通结构:character embedding layer -> feature layers (BLSTM) -> tag inference layer (CRF).

Three shared-private models for multi-criteria learning. The yellow blocks are the shared BLSTM while the gray blocks are private BLSTM. The yellow circles are shared embedding. The red information flow indicates the difference between three models.

Model 1: Parallel Shared-Private Model

把private和shared layer看作并行的,在隐层的计算相互独立。不过两个隐层一起进入CRF层

the score function in the CRF layer

Model 2: Stacked Shared-Private Model

把shared层的输出也作为private输入的一部分,并只将private的隐层输入CRF层

the hidden states of shared layer and private layer (第m个标准)

Model 3: Skip-Layer Shared-Private Model

Eq.14 + 15 + 16

Adversarial Training for Shared Layer

The architecture of Model-III with adversarial training strategy for shared layer.

为了让shared层提取到的特征是criterion-invariant的。用一个criterion discriminator判别是句子被shared features用哪个criterion标注。

Training


The objective function for multi-task model. To maximize the log conditional likelihood of true labels on all the corpora.
The criterion discriminator maximizes the cross entropy of predicted criterion distribution p(.|X) and true criterion.
The shared layer maximize the entropy of predicted criterion distribution.
Overall objective functions

Experiments

CWS

dataset: MSRA, AS, PKU, CTB, CKIP, CITYU, NCC, SXU

Knowledge Transfer

1. simplified Chinese to traditional Chinese: 先在简体中文数据集上训练,再在繁体数据集上训练并固定shared层参数。在繁体数据集上测试: AS, CKIP, CITYU

2. formal texts to informal texts: 在NLPCC2016上训练,在微博数据上测试

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • 01 我叫林含笑,父母给我起这个名字,就是想让我做一个幸福的姑娘,开开心心,无忧...
    提小莫阅读 690评论 8 6
  • 我不爱读诗,不喜欢诗人。不喜欢是因为觉得难以理解,无法呈现因果关系的意象堆叠让我觉得是对语言的滥用。但我喜欢Leo...
    qinip阅读 231评论 0 0
  • 本故事纯属虚构,如有雷同纯属巧合。 梁坤最近运气有点好。 其实可以说是极好,感觉简直万事顺心。 总行那边通知自己准...
    光怪越人阅读 722评论 6 3
  • 宝:疯狂动物城马上要上市了! 妈:上市? 宝:马上上电视了啊! 妈:.....
    rainboss阅读 209评论 0 0

友情链接更多精彩内容