PEARL-CoT: Persona-Emotion Aware Reinforcement Learning via Chain-of-Thought for Emotional Support Conversation
Abstract: Emotional Support Conversation (ESC) aims to ease seekers’ emotional distress through empathic and personalized interactions. However, existing studies predominantly focus on fitting grounded responses, overlooking the cognitive reasoning process of human supporters and seekers' preferences. To address this, we propose PEARL-CoT, a reinforcement learning (RL) framework based on Group Relative Policy Optimization (GRPO), which incorporates emotion and persona reasoning via chain-of-thought (CoT). Specifically, instead of directly generating a response, our model first infers the seeker’s emotion and persona, thereby constructing personalized empathic responses. This reasoning step is rewarded with an emotion accuracy reward and a persona consistency reward to ensure the correctness of the CoT process. Afterwards, we incorporate a helpfulness scoring reward, derived from a model trained on seeker feedback, to better align responses with seeker preferences. Additionally, a semantic relevance reward is applied to maintain consistency with human supporter responses. Experimental results demonstrate that PEARL-CoT excels at identifying seekers’ concerns, delivering emotional support, and generating responses preferred by human annotators.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: commonsense reasoning, conversational modeling
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
Submission Number: 2987
Loading