Emotional Earth Mover's Distance for Fine-Grained Hierarchical Emotion Analysis

Haitao Yu, Dawei Li, Xin Kang

Published: 2025, Last Modified: 21 Jan 2026ADMA (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Effective emotion understanding has become increasingly important in developing human-centered interactive systems, such as chatbots and companion/service robots. Despite the remarkable successes made by previous studies, we found that the hierarchical information among emotion labels has been ignored when either quantifying the optimization objective or evaluating the performance, hindering the achievement of fine-grained hierarchical emotion analysis. To bridge this gap, we propose Emotional Earth Mover’s Distance (EEMD)\(^1\), a novel framework that extends the Earth Mover’s Distance (EMD) to emotion analysis by explicitly encoding the hierarchical structure of emotion labels. This hierarchical distance is integrated into both the training loss and evaluation metric, allowing EEMD to effectively capture the hierarchical emotion nuances throughout the entire cycle of emotion analysis. To demonstrate the effectiveness of the proposed approach, we conduct a series of experiments on the widely used GoEmotions dataset. In addition to comparing our approach with representative traditional methods based on the pre-trained language model (e.g., BERT), we also compare it with different types of few-shot prompting methods based on large language models (LLMs). Furthermore, since traditional metrics for emotion analysis such as F1 score and subset accuracy do not effectively reflect a model’s ability to perform fine-grained hierarchical emotion analysis, we propose using EMD as a hierarchy-aware evaluation metric that captures the severity of misclassifications based on label structure. Our extensive empirical experiments reveal that: (1) Benefiting from the integration of hierarchical information among labels during the training process, EEMD outperforms other methods by a large margin. (2) Because LLMs (such as Llama3 and GPT) are general-purpose base models without task-specific fine-tuning, they show poor performance in hierarchical emotion analysis, especially given a large number of labels (e.g., 28 labels in GoEmotions). (3) EMD serves as an effective evaluation metric for evaluating a model’s ability to handle fine-grained hierarchical emotion analysis. (The source code: https://github.com/Shinnaaa/Hierarchical_Emotions.

External IDs:dblp:conf/adma/YuLK25