Enhancing Multimodal Sentiment Analysis through the Integration of Attention Mechanisms and Spiking Neural Networks

Enhancing Multimodal Sentiment Analysis through the Integration of Attention Mechanisms and Spiking Neural Networks

ACL ARR 2025 May Submission955 Authors

16 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The success of Vision Transformers has sparked growing interest in integrating the self-attention mechanism and Transformer-based architecture into Spiking Neural Networks (SNNs), aiming to combine the brain-inspired efficiency of SNN with the power of attention-based models. While recent efforts have introduced spiking-compatible self-attention modules, they often suffer from two key limitations: the absence of effective scaling strategies and architectural bottlenecks that hinder the extraction of fine-grained local features and the integration of multimodal information. To address these issues, we introduce the Spiking-Generated Multimodal Transformer, which features a spiking self-attention mechanism with biologically plausible and computationally efficient scaling. Unlike conventional spiking models that focus narrowly on single modalities or shallow representations, our model adopts a multi-stage architecture, including both single-modal and modality fusion networks, enabling a deeper understanding and integration of complex multimodal inputs like audio, text, and visual signals. This synergistic design allows the model to leverage the temporal dynamics of spikes while maintaining high-level semantic alignment across modalities. As a result, our approach improves both energy efficiency and performance. Experiments on benchmark datasets, including SIMS and MOSEI for multimodal sentiment analysis, validate the effectiveness of our approach.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Spiking Neural Networks, Multimodal Sentiment Analysis, Self-Attention

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study

Languages Studied: English, Chinese

Submission Number: 955

Loading