Abstract: Open-Vocabulary Multimodal Emotion Recognition (OV-MER) aims to predict emotions without being constrained by predefined label spaces, enabling fine-grained and human-like emotion understanding. Unlike traditional discriminative methods, OV-MER leverages generative models, such as large language models (LLMs) with extensive vocabularies, to capture the full spectrum of emotions. Previous approaches (like AffectGPT) primarily rely on token-level loss for training. However, this objective does not align with the emotion wheel (EW)-based evaluation metrics used in OV-MER. Unfortunately, EW-based metrics cannot be directly optimized via gradient backpropagation. In this paper, we propose AffectGPT-R1, a reinforcement learning framework that directly optimizes performance on EW-based metrics. Specifically, we treat these metrics as the reward function and employ Group Relative Policy Optimization (GRPO) to maximize rewards. Experimental results demonstrate that AffectGPT-R1 achieves significant improvements on OV-MER. We hope this work advances the field of multimodal emotion recognition. Our code will be publicly available at:https://github.com/zeroQiaoba/AffectGPT.
External IDs:dblp:journals/corr/abs-2508-01318
Loading