Co-Evolutional User Simulator and Dialogue System with Bias EstimatorDownload PDF

Anonymous

16 Oct 2022 (modified: 05 May 2023)ACL ARR 2022 October Blind SubmissionReaders: Everyone
Keywords: task-oriented dialogue system, reinforcement learning
Abstract: Reinforcement learning (RL) has emerged as a promising approach to fine-tune offline pretrained GPT-2 model in task-oriented dialogue systems. In order to obtain human-like online interactions while extending the usage of RL, building pretrained user simulators (US) along with dialogue systems (DS) and facilitating jointly fine-tuning via RL becomes prevalent. However, existing methods usually asynchronously update US and DS to ameliorate the ensued non-stationarity problem, which could bring a lot of manual operations, lead to sub-optimal policy and less sample efficiency. The paradigm of iterative training implicitly dress the distributional shift problem caused by compounding exposure bias. To take a step further for tackling the problem, we introduce an CETOD framework of with bias estimator, which enables bias-aware synchronously update for RL-based fine-tuning whilst takes advantages from GPT-2 based end-to-end modeling on US and DS. Extensive experiments demonstrate that CETOD achieves state-of-the-art success rate, inform rate and combined score on MultiWOZ2.1 dataset.
Paper Type: long
Research Area: Dialogue and Interactive Systems
0 Replies

Loading