Abstract: Empathy is a fundamental pillar of human social intelligence and a critical requirement for the development of human-centered artificial general intelligence (AGI). While large language models (LLMs) have shown remarkable general-purpose capabilities, their empathetic reasoning remains limited, largely due to the scarcity of high-quality training data. Prior work in empathetic modeling often relies on shallow emotional cues or architectural enhancements, overlooking the heterogeneous and multi-dimensional nature of empathy itself. In this work, we propose a data-efficient empathy learning framework that integrates insights from psychology—specifically, the dual dimensions of sensibility and rationality—as guiding criteria for high-quality data selection. Our approach leverages LLMs to automatically score and filter empathy dialogues, constructing curated datasets that emphasize emotionally grounded and cognitively coherent responses. We then train specialized sensibility and rationality experts, and dynamically combine their capabilities via a Mixture-of-Experts (MoE) model. Empirical results demonstrate that our framework not only achieves state-of-the-art empathetic generation but does so using significantly fewer data samples, affirming the importance of quality-driven selection in scaling empathetic AGI.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: Empathetic Data, Data Quality, Data Selection
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency, Data analysis
Languages Studied: English
Keywords: Empathetic Data, Data Quality, Data Selection
Submission Number: 985
Loading