PEARL-CoT: Persona-Emotion Aware Reinforcement Learning via Chain-of-Thought for Emotional Support Conversation

PEARL-CoT: Persona-Emotion Aware Reinforcement Learning via Chain-of-Thought for Emotional Support Conversation

ACL ARR 2025 May Submission2987 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Emotional Support Conversation (ESC) aims to ease seekers’ emotional distress through empathic and personalized interactions. However, existing studies predominantly focus on fitting grounded responses, overlooking the cognitive reasoning process of human supporters and seekers' preferences. To address this, we propose PEARL-CoT, a reinforcement learning (RL) framework based on Group Relative Policy Optimization (GRPO), which incorporates emotion and persona reasoning via chain-of-thought (CoT). Specifically, instead of directly generating a response, our model first infers the seeker’s emotion and persona, thereby constructing personalized empathic responses. This reasoning step is rewarded with an emotion accuracy reward and a persona consistency reward to ensure the correctness of the CoT process. Afterwards, we incorporate a helpfulness scoring reward, derived from a model trained on seeker feedback, to better align responses with seeker preferences. Additionally, a semantic relevance reward is applied to maintain consistency with human supporter responses. Experimental results demonstrate that PEARL-CoT excels at identifying seekers’ concerns, delivering emotional support, and generating responses preferred by human annotators.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: commonsense reasoning, conversational modeling

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English

Submission Number: 2987

Loading