DynaESC Learning Long-Term Emotional Support Strategies via Dynamic Multi-Turn Reinforcement Learning

ACL ARR 2026 January Submission7056 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Emotional Support Conversation, Reinforcement Learning, Dialogue Strategy Planning
Abstract: Current Emotional Support Conversation systems primarily focus on the quality of individual responses, while overlooking global strategy planning at the dialogue level. To enhance the system's capability in stage transition perception and strategy selection, we propose DynaESC, a Dynamic multi-turn reinforcement learning framework for Emotional Support Conversation, designed to optimize long-term dialogue management through simulated interactive training. Specifically, our framework introduces two core modules: (1) a User Simulator, which leverages Large Language Models to act as seekers based on predefined user personas, providing dynamic interactions and real-time feedback to the system, thereby enabling a high-fidelity, closed-loop interactive environment; (2) a Multi-dimensional Reward Function, which evaluates responses by balancing immediate quality with holistic planning, thereby simultaneously refining both response generation and strategic selection. Furthermore, we introduce a novel LLM-based evaluation metric that assesses the system's performance based on complete multi-turn interactions rather than isolated turns. Experimental results demonstrate that DynaESC achieves an approximately 42% improvement in overall score over its pre-trained counterpart and consistently outperforms representative baselines, showcasing its superior efficacy in providing emotional support.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: spoken dialogue systems, evaluation and metrics
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7056
Loading