SVL: Empowering Spiking Neural Networks for Efficient 3D Open-World Understanding

SVL: Empowering Spiking Neural Networks for Efficient 3D Open-World Understanding

ICLR 2026 Conference Submission377 Authors

01 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Spiking Neural Network+Spike-driven+Spike Point Transformer

TL;DR: The first spike-based multimodal framework that empowers SNNs with open-world 3D perception while maintaining spike-driven efficiency.

Abstract: Spiking Neural Networks (SNNs) offer an energy--efficient route to 3D spatio--temporal perception, yet they lag behind Artificial Neural Networks (ANNs) due to weak pretraining and heavy inference stacks, limiting generalization and multimodal reasoning (e.g., zero--shot 3D classification and open--world QA). We present a universal \textbf{S}pike--based \textbf{V}ision--\textbf{L}anguage pretraining framework (SVL) that equips SNNs with open--world 3D understanding while preserving end--to--end spike efficiency. SVL comprises two core components: (i) {Multi--scale Triple Alignment} (MTA), a label--free triplet contrastive objective aligning 3D, image, and text; and (ii) {Re--parameterizable Vision--Language Integration} (Rep--VLI), which converts offline text embeddings into lightweight weights for text--encoder--free inference. Moreover, we present the first fully spike--driven point Transformer, {Spike-driven PointFormer}, whose 3D spike--driven self--attention (3D-SDSA) reduces interactions to sparse additions, enabling faster, more efficient training. Extensive experiments show that SVL attains strong zero--shot 3D classification (85.4\% top--1) and consistently outperforms prior SNNs on downstream tasks (e.g., +6.1\% 3D cls, +2.1\% DVS actions, +1.1\% detection, +2.1\% segmentation) while enabling open--world 3D question answering, sometimes outperforming ANNs. To the best of our knowledge, SVL represents the first scalable, generalizable, and hardware-friendly paradigm for 3D open-world understanding, effectively bridging the gap between SNNs and ANNs in complex open-world understanding tasks.

Supplementary Material: zip

Primary Area: applications to neuroscience & cognitive science

Submission Number: 377

Loading