End-to-end Task-oriented Dialog Policy Learning based on Pre-trained Language ModelDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: This paper presents our approach to dialog policy learning (DPL), which aims to determine the next system’s action based on the current dialog state maintained by a dialog state tracking module. Different from previous stage-wise DPL, we propose an end-to-end DPL system to avoid error accumulation between the dialogue turns. The DPL system is deployed from two perspectives. Firstly, we consider turn-level DPL that selects the best dialog action from a predefined action set. Specifically, we proposed a dialog action-oriented BERT (DA-BERT), which integrates a new pre-training procedure named masked last action task (MLA) that encourages BERT to be dialog-aware and distill action-specific features. Secondly, we propose a word-level DPL that directly generates the dialog action. We creatively model DPL as a sequence generation model conditioned on the dialog action structure. Then GPT-2 equipped with an action structure parser module (termed as DA-GPT-2) is applied to learn the word level DPL. The effectiveness and different characteristics of the proposed models are demonstrated with the in-domain tasks and domain adaptation tasks on MultiWOZ with both simulator evaluation and human evaluation.
0 Replies

Loading