Data Geometry Determines Generalization Below the Edge-of-Stability

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep learning theory, neural networks, gradient descent, generalization theory
Abstract: Gradient Descent (GD) with large learning rates often operates in the “Edge of Stability” (EoS) regime, where the sharpness of the loss landscape is implicitly constrained. However, the mechanism by which this stability-induced regularity translates into generalization remains elusive, particularly as neural networks can fit random noise under the same regime. In this work, we demonstrate that the implicit regularization enforced by EoS is *data-dependent and highly inhomogeneous*: it is stringent in regions where data concentrates but negligible in low-density regions. Consequently, the effective model capacity is determined by data geometry. We prove two complementary results: (1) For data supported on a mixture of low-dimensional subspaces, EoS dynamics yield generalization rates dependent on the *intrinsic dimension* rather than the ambient dimension. (2) Conversely, for data distributed on a high-dimensional sphere, we prove the existence of ``flat'' interpolating solutions that satisfy the stability constraint yet exhibit memorization. Our analysis establishes that stability alone is insufficient for generalization and its success depends on favorable data geometry.
Submission Number: 78
Loading