Mitigating Forgetting and Noise: The AMLoss Method for Task-oriented Dialogue Policy Optimization

Mitigating Forgetting and Noise: The AMLoss Method for Task-oriented Dialogue Policy Optimization

ACL ARR 2025 February Submission1053 Authors

12 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Offline Reinforcement Learning (Offline RL) is widely used for optimizing task-oriented dialogue policies by training on pre-collected dialogues, which boosts efficiency, especially when data is limited. However, traditional offline RL methods struggle with accurately measuring experience priority, leading to the loss of valuable data and susceptibility to noisy samples. To this end, this paper proposes the Adjustable Mirror Loss (AMLoss) method, which redefines experience priority by quantifying the real-time incremental contribution of each experience to policy improvement. Specifically, the contribution is computed as the loss difference between the Main and Delayed Q-networks, with a larger difference indicating a more significant learning contribution and, consequently, a higher sampling priority. By emphasizing experiences that offer greater learning gains and deprioritizing those less effective or affected by noise, AMLoss helps retain critical data. Moreover, a Sum Tree structure is introduced for efficient hierarchical storage and weighted sampling of priorities. Experimental results confirm that AMLoss effectively prioritizes important experiences while filtering out noisy ones, leading to optimal performance across various tasks.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: Task-oriented Dialogue System, Dialogue Policy, Offline Reinforcement Learning, Experience Priority

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 1053

Loading