Abstract: Outstanding performance in sentiment analysis not only relies on the design of sophisticated fusion methods but also on the crucial step of designing excellent modal interaction methods. To the best of our knowledge, there are few methods addressing the capture of multimodal spatial features. Majority of feature interactions have been primarily focused on temporal aspects, with less attention given to the combined spatiotemporal feature interaction (SFI). In this paper, we design a dual-view multimodal interaction method, named DVMI, primarily consisting of two parts. In the first part, a triangular convolutional module is proposed for ample temporal interaction between modalities, implicit local and global SFI, and capturing global spatial representations. Building upon the foundation laid in the first part, the second part employs an attention mechanism for explicit global SFI. To demonstrate the effectiveness of the DVMI framework,we conduct extensive experiments on three datasets, achieving state-of-the-art experimental results.
Loading