Reinforcing Long-term Emotional Support Conversations in LLMs with Simulated Forward-Looking Feedback
Abstract: Emotional Support Conversation (ESC) systems should provide ongoing, systematic emotional support that can foster long-term user emotional well-being. Existing large language models (LLMs) oriented ESC systems have introduced dialogue planning that considers the long-term effects of supportive strategies. However, they decouple strategy selection, which relies on predefined strategy sets, from response generation, limiting adaptability to dynamic emotional scenarios and reducing control over final response quality. In this work, we propose RLSF-ESC, a novel end-to-end framework designed to enhance the inherent reasoning capabilities of LLMs through reinforcement learning for long-term emotional support conversations. To encourage LLMs to reason about the long-term impact of their generated responses, RLSF-ESC simulates future dialogue trajectories to obtain forward-looking feedback via multi-agent collaboration. Based on this feedback, we design a customized reward function that guides the optimization of the LLM through Group Relative Policy Optimization. We train RLSF-ESC on the Qwen2.5-7B-Instruct-1M and LLaMA3.1-8B-Instruct models and conduct experiments on two public datasets. Experimental results demonstrate that RLSF-ESC consistently outperforms existing baselines in terms of goal completion and response quality.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: human-in-the-loop, task-oriented
Languages Studied: English
Keywords: Emotional Support Conversations, Large Language Models, Reinforcement Learning
Submission Number: 2025
Loading