Keywords: SNN, self-supervised learning, cross temporal, image classification, energy efficiency
Abstract: Spiking Neural Networks (SNNs) offer a promising alternative to traditional artificial neural networks by leveraging sparse, event-driven computation that closely mimics biological neurons. When deployed on neuromorphic hardware, SNNs enable substantial energy savings due to their temporal and asynchronous processing. However, training SNNs remains a major challenge because of the non-differentiable nature of spike generation. In this work, we introduce the first fully self-supervised learning (SSL) framework for SNNs that scales to large-scale visual tasks without requiring labeled fine-tuning. Our method leverages intrinsic spike-time dynamics by aligning representations across time steps and augmented views. To address gradient mismatch during surrogate training, we propose the MixedLIF neuron, which combines a spiking path with an antiderivative-based surrogate path during training to stabilize optimization, while retaining a fully spiking and energy-efficient architecture at inference. We also introduce two temporal objectives, Cross Temporal Loss and Boundary Temporal Loss, that align multi-time-step outputs to improve learning efficiency. Our approach achieves strong results across ResNet and Vision Transformer-based SNNs on CIFAR-10, CIFAR10-DVS, and ImageNet-1K. Our approach further generalizes through transfer learning from ImageNet-1K to downstream tasks, including image classification, as well as COCO object detection and instance segmentation. Notably, our self-supervised SNNs match or exceed the performance of some non-spiking SSL models, demonstrating both representational strength and energy efficiency.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 6648
Loading