Locally Confined Modality Fusion Network With a Global Perspective for Multimodal Human Affective Computing

Sijie Mai, Songlong Xing, Haifeng Hu

Published: 2020, Last Modified: 17 Nov 2023IEEE Trans. Multim. 2020Readers: Everyone

Abstract: In this paper, we propose a novel multimodal fusion framework, called the locally confined modality fusion network (LMFN), that contains a bidirectional multiconnected LSTM (BM-LSTM) to address the multimodal human affective computing problem. In the LMFN, we introduce a generic fusion structure that explores both local and global fusion to obtain an integral comprehension of information. Specifically, we partition the feature vector corresponding to each modality into multiple segments and learn every local interaction through a tensor fusion procedure. Global interaction is then modeled by learning the dependence between local tensors via an originally designed BM-LSTM architecture, establishing a direct connection of cells and states of local tensors that are several time steps apart. With the LMFN, we achieve advantages over other methods in the following aspects: 1) local interactions are successfully modeled using a feasible vector segmentation procedure that can explore cross-modal dynamics in a more specialized manner; 2) global interactions are modeled to obtain an integral view of multimodal information using BM-LSTM, which guarantees an adequate flow of information; and 3) our general fusion structure is highly extendable by applying other local and global fusion methods. Experiments show that the LMFN yields state-of-the-art results. Moreover, the LMFN achieves higher efficiency compared to other models by applying the outer product as the fusion method.

0 Replies