Mitigating Strategy Preference Bias with Boundary-Aware Reward for Emotional Support Conversation

Mitigating Strategy Preference Bias with Boundary-Aware Reward for Emotional Support Conversation

ACL ARR 2026 January Submission8753 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Emotional support conversation, large language models, strategy preference bias, knowledge boundaries, reinforcement learning

Abstract: Emotional support conversation (ESC) aims to alleviate distress through empathetic dialogue, yet large language models (LLMs) face challenges in delivering effective ESC due to low accuracy in strategy planning. Moreover, there is a considerable preference bias towards specific strategies. Prior methods using fine-tuned strategy planners have shown potential in reducing such bias, while the underlying causes of the preference bias have not well been studied. In this work, we present an empirical analysis showing that strategy preference bias correlates with regions of low model confidence in strategy prediction. Based on this observation, we propose a boundary-aware reward to mitigate the bias by reinforcement learning, which optimizes strategy planning via both accuracy and entropy-based confidence for each region according to the estimated uncertainty. Experiments on the ESConv and ExTES datasets across multiple LLM backbones show that our approach consistently improves strategy selection accuracy while significantly reducing preference bias, without requiring external preference data or auxiliary modules.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: task-oriented,factuality,applications

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English,Chinese

Submission Number: 8753

Loading