Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning

Published: 01 Jan 2023, Last Modified: 14 May 2025Image Vis. Comput. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A multimodal emotion recognition framework with various self-attention mechanisms.•An audio-video fusion strategy which uses cross-attention.•A learnable emotional metric that extends the traditional triplet loss function.•An extensive objective evaluation is performed on RAVDESS and CREMA-D datasets.
Loading