Geometry of Lightning Self-Attention: Identifiability and Dimension

Nathan W. Henry; Giovanni Luca Marchetti; Kathlén Kohn

Geometry of Lightning Self-Attention: Identifiability and Dimension

Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn

Published: 22 Jan 2025, Last Modified: 19 Feb 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Lightning Self-Attention, Neuromanifolds, Algebraic Geometry

TL;DR: We study neuromanifolds of lightning self-attention networks, discussing, in particular, dimension and identifiability.

Abstract: We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3541

Loading