Enhancing Multimodal Sentiment Analysis through the Integration of Attention Mechanisms and Spiking Neural Networks

ACL ARR 2025 May Submission955 Authors

16 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The success of Vision Transformers has sparked growing interest in integrating the self-attention mechanism and Transformer-based architecture into Spiking Neural Networks (SNNs), aiming to combine the brain-inspired efficiency of SNN with the power of attention-based models. While recent efforts have introduced spiking-compatible self-attention modules, they often suffer from two key limitations: the absence of effective scaling strategies and architectural bottlenecks that hinder the extraction of fine-grained local features and the integration of multimodal information. To address these issues, we introduce the Spiking-Generated Multimodal Transformer, which features a spiking self-attention mechanism with biologically plausible and computationally efficient scaling. Unlike conventional spiking models that focus narrowly on single modalities or shallow representations, our model adopts a multi-stage architecture, including both single-modal and modality fusion networks, enabling a deeper understanding and integration of complex multimodal inputs like audio, text, and visual signals. This synergistic design allows the model to leverage the temporal dynamics of spikes while maintaining high-level semantic alignment across modalities. As a result, our approach improves both energy efficiency and performance. Experiments on benchmark datasets, including SIMS and MOSEI for multimodal sentiment analysis, validate the effectiveness of our approach.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Spiking Neural Networks, Multimodal Sentiment Analysis, Self-Attention
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study
Languages Studied: English, Chinese
Submission Number: 955
Loading