A hybrid fusion model for group-level emotion recognition in complex scenarios

Published: 31 May 2025, Last Modified: 18 Feb 2026OpenReview Archive Direct UploadEveryoneCC0 1.0
Abstract: Group-level emotion recognition (GER) differs from general emotion recognition due to the variable number of individuals in the image and the strong randomness of the scene, enabling it to significantly enhance group preference prediction accuracy in affective recommender systems. Traditional deep learning-based emotion recognition methods are ineffective in recognizing complex scenes, such as scenarios where faces are difficult to detect. To tackle this issue, we propose a novel method for group-level emotion recognition that combines multi-scale and multi-modal cues including facial expressions, scenes, and human poses. The method also designs a hybrid attention model that combines coarse-grained and fine-grained features. Additionally, we collect a dataset called the Group and Scene Emotions Dataset, which allows us to work with complex scenarios, such as a smoky concert scene or a scene with explosions in a car accident. Experiments conducted on the publicly available Group Affect Database 2.0 achieved an accuracy of 79.6%, outperforming other methods using the same evaluation protocol. Experimental results demonstrated that the proposed method performed well on the Group and Scene Emotions Dataset, with prediction accuracies of 97.51% and 97.90% for the validation and test sets, respectively. Code and trained models are available at https://github.com/shuaipenger/Group-Emotion-Recognition.
Loading