OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition

Zheng Lian; Haiyang Sun; Licai Sun; Haoyu Chen; Lan Chen; Hao Gu; Zhuofan Wen; Shun Chen; Zhang Siyuan; Hailiang Yao; Bin Liu; Rui Liu; Shan Liang; Ya Li; Jiangyan Yi; Jianhua Tao

OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition

Zheng Lian, Haiyang Sun, Licai Sun, Haoyu Chen, Lan Chen, Hao Gu, Zhuofan Wen, Shun Chen, Zhang Siyuan, Hailiang Yao, Bin Liu, Rui Liu, Shan Liang, Ya Li, Jiangyan Yi, Jianhua Tao

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Abstract: Multimodal Emotion Recognition (MER) is a critical research area that seeks to decode human emotions from diverse data modalities. However, existing machine learning methods predominantly rely on predefined emotion taxonomies, which fail to capture the inherent complexity, subtlety, and multi-appraisal nature of human emotional experiences, as demonstrated by studies in psychology and cognitive science. To overcome this limitation, we advocate for introducing the concept of *open vocabulary* into MER. This paradigm shift aims to enable models to predict emotions beyond a fixed label space, accommodating a flexible set of categories to better reflect the nuanced spectrum of human emotions. To achieve this, we propose a novel paradigm: *Open-Vocabulary MER (OV-MER)*, which enables emotion prediction without being confined to predefined spaces. However, constructing a dataset that encompasses the full range of emotions for OV-MER is practically infeasible; hence, we present a comprehensive solution including a newly curated database, novel evaluation metrics, and a preliminary benchmark. By advancing MER from basic emotions to more nuanced and diverse emotional states, we hope this work can inspire the next generation of MER, enhancing its generalizability and applicability in real-world scenarios. Code and dataset are available at: https://github.com/zeroQiaoba/AffectGPT.

Lay Summary: Recognizing human emotions is a key challenge in AI, but current methods often rely on limited, predefined emotion categories that don’t fully capture the complexity of how people feel. Therefore, we propose a new approach called *Open-Vocabulary Multimodal Emotion Recognition (OV-MER)*, which allows AI models to predict emotions beyond fixed labels, enabling a more flexible and nuanced understanding of human feelings. However, creating a dataset that covers the full spectrum of human emotions is nearly impossible. To address this, we introduce a new framework, including a curated dataset, novel evaluation metrics, and a benchmark. Our work aims to push the boundaries of emotion recognition, making it more adaptable and useful in real-world applications. By moving beyond rigid emotion categories, we hope to inspire the next generation of AI systems that better understand and respond to human emotions.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/zeroQiaoba/AffectGPT

Primary Area: Applications

Keywords: multimodal emotion recognition, OV-MER, dataset, benchmark

Flagged For Ethics Review: true

Submission Number: 11047

Loading