Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks

Xian Zhong; Shengwang Hu; Wenxuan Liu; Wenxin Huang; Jianhao Ding; Zhaofei Yu; Tiejun Huang

Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks

Xian Zhong, Shengwang Hu, Wenxuan Liu, Wenxin Huang, Jianhao Ding, Zhaofei Yu, Tiejun Huang

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Spiking neural networks (SNNs) have garnered significant attention for their low power consumption and high biological interpretability. Their rich spatio-temporal information processing capability and event-driven nature make them ideally well-suited for neuromorphic datasets. However, current SNNs struggle to balance accuracy and latency in classifying these datasets. In this paper, we propose Hybrid Step-wise Distillation (HSD) method, tailored for neuromorphic datasets, to mitigate the notable decline in performance at lower time steps. Our work disentangles the dependency between the number of event frames and the time steps of SNNs, utilizing more event frames during the training stage to improve performance, while using fewer event frames during the inference stage to reduce latency. Nevertheless, the average output of SNNs across all time steps is susceptible to individual time step with abnormal outputs, particularly at extremely low time steps. To tackle this issue, we implement Step-wise Knowledge Distillation (SKD) module that considers variations in the output distribution of SNNs at each time step. Empirical evidence demonstrates that our method yields competitive performance in classification tasks on neuromorphic datasets, especially at lower time steps.

Primary Subject Area: [Experience] Multimedia Applications

Secondary Subject Area: [Experience] Multimedia Applications

Relevance To Conference: This work significantly contributes to the field of multimedia and multimodal processing through its exploration of the capabilities of Dynamic Vision Sensors (DVS) as a novel form of visual multimedia, and Spiking Neural Networks (SNNs) as the next generation neural network . DVS is a type of bio-inspired visual sensor, operates differently from conventional cameras. Instead of capturing images at a fixed rate, DVS asynchronously measures intensity changes at each pixel and records the time, position, and polarity of these changes as an event stream. Due to their high dynamic range, high temporal resolution, and low latency, DVS is increasingly popular across various domains. And these advantages address the shortcomings of traditional camera perception of the external environment, thereby tackling challenging issues in autonomous driving scenarios. The visual event-based neuromorphic data primarily refers to datasets collected by DVS. Given that SNNs excel in processing information across the time dimension and event-driven communication, they demonstrate promising performance in handling event-based neuromorphic data. Our work aims to investigate the trade-off between the accuracy and latency of SNNs in event-based visual recognition.

Supplementary Material: zip

Submission Number: 563

Loading