Keywords: Spiking neural network, multi-modal architecture, underwater object detection
Abstract: Multi-modal artificial neural networks (ANNs) have demonstrated strong performance gains in object detection by leveraging complementary information from diverse data modalities. However, these gains often come at the cost of substantial increased computational demands due to dense operations and multi-branch architectures. To address these challenges, we propose MMSNN, a novel Multi-Modal Spiking Neural Network for efficient underwater object detection. MMSNN integrates RGB features with Local Binary Pattern (LBP) representation, capturing both fine-grained visual details and illumination-robust texture cues within a spike-driven architecture. At the core of MMSNN is the Spike-Driven Multi-Modal Fusion (SMMF) module, a lightweight yet expressive component designed to enable efficient cross-modal feature interaction. The SMMF uses channel grouping and shuffling to promote localized feature interaction and enhance representational diversity, while its spike‑driven attention mechanism reduces computational overhead without compromising discriminative power. Extensive experiments on the RUOD and DUO underwater datasets demonstrate that MMSNN achieves state-of-the-art performance with an excellent balance between robust accuracy and computational efficiency.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 12660
Loading