Reinforcing Long-term Emotional Support Conversations in LLMs with Simulated Forward-Looking Feedback

Reinforcing Long-term Emotional Support Conversations in LLMs with Simulated Forward-Looking Feedback

ACL ARR 2025 May Submission2025 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Emotional Support Conversation (ESC) systems should provide ongoing, systematic emotional support that can foster long-term user emotional well-being. Existing large language models (LLMs) oriented ESC systems have introduced dialogue planning that considers the long-term effects of supportive strategies. However, they decouple strategy selection, which relies on predefined strategy sets, from response generation, limiting adaptability to dynamic emotional scenarios and reducing control over final response quality. In this work, we propose RLSF-ESC, a novel end-to-end framework designed to enhance the inherent reasoning capabilities of LLMs through reinforcement learning for long-term emotional support conversations. To encourage LLMs to reason about the long-term impact of their generated responses, RLSF-ESC simulates future dialogue trajectories to obtain forward-looking feedback via multi-agent collaboration. Based on this feedback, we design a customized reward function that guides the optimization of the LLM through Group Relative Policy Optimization. We train RLSF-ESC on the Qwen2.5-7B-Instruct-1M and LLaMA3.1-8B-Instruct models and conduct experiments on two public datasets. Experimental results demonstrate that RLSF-ESC consistently outperforms existing baselines in terms of goal completion and response quality.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: human-in-the-loop, task-oriented

Languages Studied: English

Keywords: Emotional Support Conversations, Large Language Models, Reinforcement Learning

Submission Number: 2025

Loading