Multiple Feature Refining Network for Visual Emotion Distribution Learning

Published: 2025, Last Modified: 05 Jan 2026AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The significance of visual emotion distribution learning (VEDL) has surged, particularly with the growing inclination to convey emotions through images. The key of VEDL lies in capturing both low- and high-level features within the same visual content, thus promoting the model for salient and subtle emotion awareness. To learn the distribution of emotions involved in images, most previous works learn coarse semantic knowledge with unbiased filtering. Consequently, they focus on the entire scene and suffer from the redundancy of semantic-irrelevant information, which diminishes the affective coherence, impeding the comprehension of emotional attributes within the treated features. In light of this, we reanalyze from the perspective of information filtering and propose a novel method called Multiple Feature Refining Network (MFRN). To minimize low-level feature redundancy, we design a wavelet-based separated frequency modeling, named Spectral Mixer, to learn invariant representations and enhance emotion saliency in low-level image features. At the higher semantic level, we design a Semantic Graph Prompt Learning for emotional semantic filtering, ensuring the purity of emotional information and providing the model with richer content semantics. Experiments conducted on three commonly used datasets have demonstrated the superiority of our MFRN model over cutting-edge methods.
Loading