Analyzing Attention Mechanisms through Lens of Sample Complexity and Loss LandscapeDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Attention mechanisms, deep learning, sample complexity, self-attention
Abstract: Attention mechanisms have advanced state-of-the-art deep learning models in many machine learning tasks. Despite significant empirical gains, there is a lack of theoretical analyses on their effectiveness. In this paper, we address this problem by studying the sample complexity and loss landscape of attention-based neural networks. Our results show that, under mild assumptions, every local minimum of the attention model has low prediction error, and attention models require lower sample complexity than models without attention. Besides revealing why popular self-attention works, our theoretical results also provide guidelines for designing future attention models. Experiments on various datasets validate our theoretical findings.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We theoretically and empirically analyze the superiority of attention mechanisms in aspect of sample complexity and loss landscape.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=Sha3cQUgK
12 Replies

Loading