Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration

Zhengyang Zhuge; Peisong Wang; Xingting Yao; Jian Cheng

Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration

Zhengyang Zhuge, Peisong Wang, Xingting Yao, Jian Cheng

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Nowadays Spiking Transformers have exhibited remarkable performance close to Artificial Neural Networks (ANNs), while enjoying the inherent energy-efficiency of Spiking Neural Networks (SNNs). However, training Spiking Transformers on GPUs is considerably more time-consuming compared to the ANN counterparts, despite the energy-efficient inference through neuromorphic computation. In this paper, we investigate the token sparsification technique for efficient training of Spiking Transformer and find conventional methods suffer from noticeable performance degradation. We analyze the issue and propose our Sparsification with Timestep-wise Anchor Token and dual Alignments (STATA). Timestep-wise Anchor Token enables precise identification of important tokens across timesteps based on standardized criteria. Additionally, dual Alignments incorporate both Intra and Inter Alignment of the attention maps, fostering the learning of inferior attention. Extensive experiments show the effectiveness of STATA thoroughly, which demonstrates up to $\sim$1.53$\times$ training speedup and $\sim$48% energy reduction with comparable performance on various datasets and architectures.

Submission Number: 8584

Loading