Personalized Multimodal Emotion Recognition: Integrating Temporal Dynamics and Individual Traits for Enhanced Performance

Published: 01 Jan 2024, Last Modified: 29 May 2025ISCSLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The integration of multimodal information, encompassing visual’ auditory, and textual data, has significantly propelled the field of emotion recognition. As applications increasingly demand personalized solutions, researchers are increasingly drawn to the challenge of incorporating personalized factors into multimodal emotion recognition systems. However, the effective and robust integration of diverse modalities and personalized information remains a formidable challenge. This paper proposes a personalized emotion recognition framework that leverages the fusion of multimodal information, including visual, textual, and auditory inputs, alongside personalized data. To address the temporal discrepancies inherent in different modalities, a multimodal adaptive alignment module is introduced to harmonize the temporal variances across various feature spaces. Furthermore, to enhance the integration of temporal modal features, a temporal hybrid attention module is employed to distill the essence of the temporal multimodal features. For modal features that do not inherently possess temporal characteristics, such as personality traits, an information compression module is utilized to encapsulate the features. The culmination of these processes results in the input of all condensed features into an emotion recognition module, thereby yielding the emotional score of the sample. Experimental results demonstrate that the incorporation of personalized information significantly bolsters the performance of multimodal emotion recognition, as evidenced by the method's championship in the track 2 of the competition.
Loading