Keywords: Recognition of negative emotions, Fine-grained, Multimodal
Abstract: The recognition of negative emotions is pivotal in numerous real-world applications, including public opinion analysis, customer service, emotional attribution, and emotional support systems, where these emotions manifest with fine-grained characteristics. However, current models struggle with fine-grained negative emotion recognition tasks due to the limited granularity in existing multimodal emotion recognition datasets. To address this, we refine coarse-grained emotion categories, expanding negative emotions from conventional 4-5 types to 8 specific categories. Based on this refined taxonomy, we construct **Libra-Emo**, a comprehensive dataset for multimodal fine-grained negative emotion detection. It comprises **Libra-Emo Trainset** for model development and **Libra-Emo Bench** for evaluation, collectively containing 62,267 video samples annotated through a novel human-machine collaborative active learning strategy, surpassing existing datasets in both granularity and scale. We present extensive experimental results from zero-shot evaluations using Libra-Emo Bench and instruction-tuning experiments with Libra-Emo Trainset on leading Multimodal Large Language Models (MLLMs). Our findings demonstrate that while current MLLMs exhibit limited proficiency in fine-grained negative emotion detection, models fine-tuned on Libra-Emo Trainset show substantial performance improvements that generalize effectively to out-of-domain evaluations. This work addresses critical limitations in existing multimodal emotion recognition datasets regarding emotion category granularity and representation of negative emotions, thus advancing research in fine-grained emotional analysis. The dataset and models will be fully open-sourced.
Primary Area: datasets and benchmarks
Submission Number: 10263
Loading