Keywords: self attention, cross-correlation
TL;DR: Simple Parameter-free Self-attention Approximation
Abstract: The hybrid model of self-attention and convolution is one of the methods to lighten ViT. The quadratic computational complexity of self-attention with respect to token length limits the efficiency of ViT on edge devices. We propose a self-attention approximation without training parameters, called SPSA, which captures global spatial features with linear complexity. To verify the effectiveness of SPSA combined with convolution, we conduct extensive experiments on image classification and object detection tasks. The source code will be available.
4 Replies
Loading