Learning What Matters: Dynamic Experience Prioritization for Task-Oriented Dialogue Policy via Stage-aware Experience Management
Abstract: Experience replay plays a pivotal role in enhancing sample efficiency for reinforcement learning-based dialogue policy optimization. However, traditional random sampling or static heuristic strategies fail to dynamically exploit critical experiences following policy learning stages, resulting in inefficient sampling and noise propagation. To address this issue, this paper presents a dynamic Stage-aware Experience Management (SEM) framework that establishes quantitative mapping between policy learning stages and experience states to adjust replay priorities adaptively. This framework adopts a quadripartite experience state paradigm to characterize the stages of policy learning and provide a quantitative basis for experience management decisions. Moreover, a dual Q-network structure is employed to monitor loss discrepancies and trends in real-time, discriminating each experience as stable, forgotten, unmastered, or noisy. Benefiting from this dynamic stage-aware mechanism, the SEM prioritizes replaying critical experiences in forgotten and unmastered experiences to strengthen weak links while suppressing noisy samples to reduce interference. Experiments on four public dialogue datasets verify the effectiveness and generalizability of the SEM in dynamic priority management.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: Task-oriented Dialogue System, Dialogue Policy, Off-policy Reinforcement Learning, Sampling Efficiency, Experience Priority
Contribution Types: NLP engineering experiment
Languages Studied: English
Keywords: Task-oriented Dialogue System, Dialogue Policy, Off-policy Reinforcement Learning, Sampling Efficiency, Experience Priority
Submission Number: 2497
Loading