Track: Extended Abstract Track
Keywords: Self-supervised learning, spiking neural networks, contrastive learning, event-based vision
Abstract: Artificial Neural Network (ANN) pre-training, followed by fine-tuning, is an established procedure to solve real-world problems where labeled data is scarce. This paper aims to adapt this established procedure to the domain of event-based vision and Spiking Neural Networks (SNNs). Event-based sensors, inspired by the retina, capture visual scenes with low latency and high dynamic range, making them suitable for many real-world vision problems. SNNs, inspired by biological neural networks, when implemented on neuromorphic hardware, enable energy-efficient and low-latency processing, making them well-suited for fully event-based pipelines. However, the lack of sufficiently large labeled datasets hinders the pre-training of SNNs. Here, we leverage joint frame and event data to forego labeling. We achieve this using self-supervised contrastive learning, where an ANN and SNN pair are jointly trained to assimilate (contrast) (un)related frame-event stream pairs. We show that the pre-trained SNN model reaches higher accuracy on several downstream visual classification benchmarks. These results signify that pre-training large-scale SNNs using raw data output from event cameras is possible and paves the way toward foundation SNN models.
Submission Number: 26
Loading