MECG: Modality-Enhanced Convolutional Graph for Unbalanced Multimodal RepresentationsDownload PDF

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone
Abstract: In multimodal sentiment analysis tasks, it is very challenging to model the relationships between different modalities and fusing them. The problem in this area is the unbalance of sentiment representation and distribution across the different modalities, resulting in a fusion process that deviates from the multimodal sentiment semantic space. We propose a novel fusion framework, MECG, which is based on graph convolutional neural networks and provides an efficient approach for fusing unaligned multimodal sequences. With the help of text modalities, we first use the multimodal enhancement module to enhance visual and acoustic modalities for obtaining more discriminative modalities, thus assisting the subsequent aggregation process. In addition, we construct text-driven multimodal feature graphs for modality fusion, which can effectively deal with the unbalance issue among modalities in the graph convolution aggregation process. Finally, we integrate the fused information extracted by MECG into the verbal representation, thus dynamically transforming the original word representations toward the most accurate multimodal sentiment-semantic space.Our model proves its effectiveness and superiority on two publicly available datasets, CMU-MOSI and CMU-MOSEI.
Paper Type: long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
0 Replies

Loading