Abstract: Attention operators have been widely applied on data of various orders and dimensions such as texts, images, and videos. One challenge of applying attention operators is the excessive usage of computational resources. This is due to the usage of dot product and softmax operator when computing similarity scores. In this work, we propose the Siamese similarity function that uses a feed-forward network to compute similarity scores. This results in the Siamese attention operator (SAO). In particular, SAO leads to a dramatic reduction in the requirement of computational resources. Experimental results show that our SAO can save 94% memory usage and speed up the computation by a factor of 58 compared to the regular attention operator. The computational advantage of SAO is even larger on higher-order and higher-dimensional data. Results on image classification and restoration tasks demonstrate that networks with SAOs are as effective as models with regular attention operator, while significantly outperform those without attention operators.
Original Pdf: pdf
6 Replies
Loading