Improving Proactive Dialogue Strategy Planning with Interactive Environment and Goal-oriented Reward
Abstract: Proactive dialogue has become a crucial yet challenging aspect of human-computer interaction, applicable to various non-collaborative dialogue tasks such as negotiation, persuasion, and psychological counseling. However, current proactive dialogue systems are hindered by their simplistic single-turn interactions and lack of capability for multi-turn, long-term strategy planning, which obstructs effective goal completion. Additionally, corpus-based training procedures are inadequate for addressing low-resource environments and transferability requirements across different dialogue tasks. In this paper, we introduce a proactive dialogue strategy planning (ProDSP) method to overcome these challenges. By utilizing a small supervised fine-tuning language model, we enable the anticipation of future strategy sequences as simulation hints. This approach guides large language models (LLMs) in generating goal-oriented responses and facilitates training within an interactive environment using another LLM-based user simulator. To assess online user feedback during the training process, we employ a GPT-4-based user simulator to represent goal-oriented rewards through multi-faceted metrics. Extensive experiments demonstrate that our model surpasses competitive baselines in both strategy planning and dialogue generation for emotional support and negotiation tasks, offering a more adaptive and efficient approach to proactive dialogue strategy planning.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: task-oriented
Contribution Types: NLP engineering experiment, Reproduction study, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 2216
Loading