EMOCAP: Deep Dive in Systematic Assessment of Large Language Models in Emotional Intelligence through Multi-Turn Conversations
Abstract: Large Language Models (LLMs) often lack robust emotional intelligence, limiting their effectiveness in sensitive domains such as mental health and crisis response.
Existing open-source LLMs struggle to track nuanced emotions over multi-turn dialogues, resulting in shallow or misaligned responses. Proprietary models show promise but remain closed-source, hindering transparent evaluation and improvement. To address these limitations, we propose EMOCAP, a comprehensive Emotional Intelligence framework that integrates well-established psychological frameworks (e.g., Ekman, Plutchik, Russell, Goleman, Affective Domain in Blooms Taxonomy etc.) for enhanced emotion detection, contextual adaptation, and ethical alignment. We develop a multi-turn, domain-general dataset and evaluation protocol to test how LLMs manage evolving emotions, mixed affective states, and subtle cues. Our experiments compare baseline open-source LLMs (Gemma-2-9b, Qwen2.5-7b and Llama-3-8B) against its instruction fine tuned versions (Gemma-2-9b-It, Qwen2.5-7b-It and Llama-3-8B-It) .Models incorporating the recognition and response guidelines well demonstrate better emotional tracking, reduced repetitive responses, and more ethically aligned output compared to standard baselines, although complex scenarios (e.g., sarcasm) remain challenging. By providing an open-source taxonomy and benchmark for emotional intelligence, this work lays the groundwork for empathetic, context-aware, and ethically responsible LLMs across various real-world applications.
Paper Type: Long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Research Area Keywords: Computational Social Science and Cultural Analytics, Dialogue and Interactive Systems, Resource and Evaluation
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 6497
Loading