Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemmanian Geometry Finds It

Marvin F. da Silva; Felix Dangel; Sageev Oore

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemmanian Geometry Finds It

Marvin F. da Silva, Felix Dangel, Sageev Oore

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: generalization, symmetry, sharpness, flatness, riemannian geometry, loss landscape

TL;DR: We introduce a new, geodesic-based sharpness that can operate on a quotient manifold where neural network parameter symmetries have been factored out, revealing strong correlation with generalization in transformers.

Abstract: The concept of sharpness has been successfully applied to traditional architectures like MLPs and CNNs to predict their generalization. For transformers, however, recent work reported weak correlation between flatness and generalization. We argue that existing sharpness measures fail for transformers, because they have much richer symmetries in their attention mechanism that induce directions in parameter space along which the network or its loss remain identical. We posit that sharpness must account fully for these symmetries, and thus we redefine it on a quotient manifold that results from quotienting out the transformer symmetries, thereby removing their ambiguities. Leveraging tools from Riemannian geometry, we propose a fully general notion of sharpness, in terms of a geodesic ball on the symmetry-corrected quotient manifold. In practise, we need to resort to approximating the geodesics. Doing so up to first order yields existing adaptive sharpness measures, and we demonstrate that including higher-order terms is crucial to recover correlation with generalization. We present results on diagonal networks with synthetic data, and show that our geodesic sharpness reveals the correlation for real-world transformers on ImageNet.

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12396

Loading