Abstract: Event-based cameras, with their high temporal resolution and low energy consumption, offer significant advantages over conventional cameras for computer vision tasks. However, current semantic segmentation algorithms for event-based data face challenges in achieving optimal performance for two main reasons: 1) the mismatch between sparse event streams (even when encoded into pseudo-frames) and the dense data structure of traditional frame-based images for which most existing implementations were designed, and 2) the lack of texture information in events, which solely detect temporal variations in brightness. To address this challenge, we propose a novel Cross-Modal (CM) Knowledge Distillation (KD) approach. Our method transfers knowledge from a high-performing Artificial Neural Network (ANN) processing fused grey-scale images and events to a Spiking Neural Network (SNN) -- a bio-inspired, energy-efficient computing paradigm -- operating on event data alone. Experiments on the DDD17 and DSEC-semantic datasets demonstrate that our approach significantly improves semantic segmentation results while reducing SNN energy consumption. This work represents the first application of cross-modal knowledge distillation to neuromorphic semantic segmentation, paving the way for more efficient event-based vision systems.
Loading