2021-09-13 | 近期文献阅读笔记

End-to-End Neural Pipeline for Goal-Oriented Dialogue Systems using GPT-2

Donghoon Ham, Jeong-Gwan Lee, Youngsoo Jang, Kee-Eung Kim. KAIST. ACL2020

Code can be found here

HighLight

  • It is trained to follow the traditional dialogue management pipeline, making the monolithic neural model more interpretable and easily integratable with external systems
  • It is trained in an end-to-end fashion with simple gradient descent
  • Leverages GPT-2, a powerful pre-trained language model

Introduction

Traditional goal-oriented dialogue system mostly adopts a pipelined modular architecture:

  • Natural Language Understanding (NLU) module that first recognizes and comprehends user’s intent and extracts values for slots
    • Input: user's utterance X_n
    • Output: U_n=(I_n, Z_n), where I_n refers to Intention, Z_n refers to slot-value pairs
  • Dialogue State Tracking (DST) module that tracks the values of slots
    • Input: U_n, A_{n-1}, S_{n-1} (N-best list)
    • Output: S_{n}
  • Dialogue policy (POL) module that decides the system action
    • Input: S_{n}
    • Output: A_{n}
  • Natural language generation (NLG) module that generates the utterance that corresponds to the system action
    • Input: A_{n}
    • Output: Y_{n}

End-to-end methods build a dialog system using a single model, where natural language context is taken as input and natural language response is generated as an output

Dataset

MultiWOZ dataset
Evaluated by ConvLab

An example of a single-domain dialogue in the MultiWOZ dataset

Each dialogue consists of ‘Goal’, ‘Database’ and ‘Dialogue turns’.

  • Goal is defined by the domain and the slots. The slots are divided into informable, requestable and book slots.
    • Informable slots represent user constraints
    • Requestable slots hold additional information that the user wants to obtain
    • Book slots are used to reserve a place recommended by the system

End-to-end neural dialogue model

An overall architecture with a concrete example
  1. Predict the recent domain and the corresponding dialogue state conditioned on the dialogue history
  2. Predict the system action with delexicalized tokens conditioned on the dialogue history and dialogue state
  3. If the system action (e.g. ‘inform’, ‘book’) needs external information from the database, the query module2 retrieves the candidates and returns one of them
  4. Update the current system action when detecting Empty Query Results
  5. Generate the system response with delexicalized tokens conditioned on dialogue history, dialogue state, and system action
  6. Update the delexicalized tokens in the system response with the query result

Input

In the MultiWOZ dataset, the ‘metadata’ is treated as the dialogue state and the ‘dialogue act’ is treated as the system action

Delimiter tokens :

  • <usr>
  • <sys>
  • <ds>
  • <sa>

Special tokens :

  • domain and slot names
  • <nm> and <dc>

Input embedding = Token embedding + Speaker embedding + Positional embedding

Training Objective

The objective function is the weighted sum of the objectives of language modeling (LM) and next-utterance classification (NC) :

L_{\text {total }}(W)=\alpha_{L M} L_{L M}(W)+\alpha_{N C} L_{N C}(W)

  • For LM, L_{L M}\left(w_{1}, \ldots, w_{n}\right)=\sum_{i} \log P\left(w_{i} \mid w_{1}, \ldots, w_{i-1}\right)
  • For NC, the model needs to distinguish the gold response (gold dialogue state+gold system action+gold system response) from a distractor (gold dialogue state+gold system action+fake system response), given the dialogue history

Result

On DSTC8:


最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。