Abstract: As the self-attention mechanism offers powerful capabilities for capturing sequential relationships, it has become increasingly popular to use it for modeling user behavior sequences in recommender systems. However, the self-attention mechanism has a quadratic computational complexity of O(n^2), as it conducts interactions among all item pairs in the sequence. This can lead to expensive model training and slow inference speeds, which may hinder practical deployment. To this end, we pursue to develop alternative approaches to improve the efficiency of the self-attention mechanism. We observe that the attention scores calculated from each item interacting with other items (including itself) are sparse, indicating that there are limited valuable item pairs (with non-zero attention weight) that contribute to the final output. This motivates us to develop effective strategies for discerning valuable items and computing attention scores solely for these items, thereby minimizing the consumption of unnecessary computations. Herein, we present a novel Progressive Sampling-based Self-Attention (PS-SA) mechanism, which utilizes a learnable progressive sampling strategy to identify the most valuable items. Subsequently, we solely utilize these selected items to produce the final output. Experiments on academic and production datasets demonstrate PS-SA could still achieve promising results while reducing computational costs. It is notable that we have successfully deployed it on Alibaba display advertising system, resulting in a 2.6% CTR and 1.3% RPM increase.
Loading