Keywords: Supramodal emotion concept; Human behavior; Replay strategy
TL;DR: AI application inspired by neuroscience knowledge
Abstract: Multimodal emotion recognition has shown promise but is often hindered by the complexity of integrating heterogeneous sensory inputs. Intriguingly, the human brain addresses this challenge through abstract, modality-independent emotion schemas, known as supramodal emotion concepts, which are learned gradually from emotional experiences across different sensory modalities. Here, we propose a learning strategy to construct supramodal emotion concepts across vision, text, and audio. Each modality’s data repeatedly passes through a shared emotion encoder and its corresponding modality-specific non-emotion encoder in a decoupling framework, extracting modality-independent emotion representations. Inspired by hippocampal replay in humans, these representations are aggregated from a memory pool during downstream emotion recognition to form supramodal emotion concepts. We demonstrate the effectiveness of this approach in multiple settings:(1) a lightweight image-based model achieves state-of-the-art results on several benchmark datasets with lower complexity than existing unimodal methods; (2) unimodal models using vision, text, or audio from video clips achieve performance comparable to multimodal models; and (3) concept-guided multimodal models further improve performance, surpassing current state-of-the-art.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 1636
Loading