Invariant Attention: Provable Clustering Under Transformations

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Attention, Vision Transformer, Clustering, geometric transformations, CNN, Convolutional Neural Networks
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Invariant Attention mechanism provably clusters data subject to geometric transformations.
Abstract: Attention mechanisms play a crucial role in state-of-the-art vision architectures, enabling them to rapidly identify relationships between distant image patches. Conventional attention mechanisms do not incorporate other structural properties of images, such as invariance to geometric transformations, instead learning these properties from data. In this paper, we introduce a novel mechanism, Invariant Attention, which, like standard attention, captures image similarity, but with the additional guarantee of being agnostic to geometric transformations. We provide theoretical assurance and empirical verification that invariant attention is far more successful than standard kernel attention on multi-class, transformed vision data, and illustrate its potential to correctly cluster transformed data with intra-class variation.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6740
Loading