Decoupling Strategy and Surface Realization for Task-oriented DialoguesDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Strategy Optimization, Surface Realization, Task-oriented Dialogue
Abstract: Task-oriented dialogue systems assist users in completing various tasks by generating appropriate responses. The key lies in effective strategy learning and surface realization, which are largely mixed together by the cutting-edge methods. They thus face two problems: a) the learning of high-level strategy could easily be misled by the detailed word sequence optimization, and b) directly emphasizing the agent's goal through reinforcement learning (RL) also leads to corrupted solutions like ungrammatical or repetitive responses. In this work, we propose to decouple the strategy learning and surface realization in a general framework, called DSSR. The core is to construct a latent content space for strategy optimization and disentangle the surface style from it. Specifically, we optimize the latent content distribution for strategy towards task completion, and assume that such distribution is shared across different surface style realizations. By further constructing an encoder-decoder scheme for the surface part, it not only facilitates decoupled optimization via RL for both strategy and surface asynchronously, but also supports controllable surface style transfer of responses. We test DSSR on the multi-domain dialogue datasets MultiWoz 2.0 and MultiWoz 2.1 in comparison with methods mixing strategy and surface realization in different levels, showing improvements in the performance evaluated by various evaluation metrics. Finally, we demonstrate the semantic meanings of latent content distributions to show the disentangling effect of DSSR, and show that it can do effective surface style transfer as by-products.
One-sentence Summary: For response generation in task-oriented dialogues, we decouple the strategy learning and surface realization to facilitate more targeted optimization and acheive better interpretability and controllability.
Supplementary Material: zip
7 Replies

Loading