HGLNET: A Generic Hierarchical Global-Local Feature Fusion Network for Multi-Modal ClassificationDownload PDFOpen Website

Published: 2022, Last Modified: 14 Apr 2024ICME 2022Readers: Everyone
Abstract: Multi-modal fusion aims to capture the semantic interactions between different modalities for many downstream classification tasks. However, previous work usually considers that each modality contributes equal information to the final classification and extracts the global features of each modality for fusion. In this paper, inspired by these two observations, we propose a generic Hierarchical Global-Local feature fusion Network (HGLNet) for multi-modal classification. Specifically, HGLNet has three merits compared to the current work. (1) HGLNet proposes a Global Gated Attention (GGA) module, which adaptively generates weights that represent the contributions of different modalities. (2) HGLNet presents a novel Cross Residual Transformer (CRT) module to capture the fine-grained local interactions. (3) HGLNet utilizes hierarchical information for multi-modal fusion. Extensive experiments on three public datasets demonstrate that HGLNet achieves competitive performance against the state-of-the-art methods for three kinds of multi-modal classification tasks.
0 Replies

Loading