Acquiring New Knowledge Without Losing Old Ones for Effective Continual Dialogue Policy Learning

Published: 2024, Last Modified: 28 Jan 2026IEEE Trans. Knowl. Data Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Dialogue policy learning is the core decision-making module of a task-oriented dialogue system. Its primary objective is to assist users to achieve their goals effectively in as few turns as possible. A practical dialogue-policy agent must be able to expand its knowledge to handle new scenarios efficiently without affecting its performance. Nevertheless, when adapting to new tasks, existing dialogue-policy agents often fail to retain their existing (old) knowledge. To overcome this predicament, we propose a novel continual dialogue-policy model which tackles the issues of “not forgetting the old” and “acquiring the new” from three different aspects: (1) For effective old-task preservation, we introduce the forgetting preventor which uses a behavior cloning technique to force the agent to take actions consistent with the replayed experience to retain the policy trained on historic tasks. (2) For new-task acquisition, we introduce the adaption accelerator which employs an invariant risk minimization mechanism to produce a stable policy predictor to avoid spurious corrections in training data. (3) For reducing the storage cost of the replayed experience, we introduce a replay manager which helps regularly clean up the old data. The effectiveness of the proposed model is evaluated both theoretically and experimentally and demonstrated favorable results.
Loading