Benign Oscillation within Minimal Invariant Subspaces at the Edge of Stability

OpenReview Anonymous Preprint Submission586 Authors

24 May 2024 (modified: 24 May 2024)Anonymous Preprint SubmissionEveryoneRevisionsCC BY-NC 4.0
Keywords: egde of stability, deep matrix factorization, low rank, deep linear network
TL;DR: Oscillations at edge of stability occur in an invariant subspace
Abstract: In this work, we provide a fine-grained analysis of the training dynamics of weight matrices with a large learning rate $\eta$, commonly used in machine learning practice for improved empirical performance. This regime is also known as the edge of stability, where sharpness hovers around $2/\eta$, and the training loss oscillates yet decreases over long timescales. Within this regime, we observe an intriguing phenomenon: the oscillations in the training loss are artifacts of the oscillations of only a few leading singular values of the weight matrices within a small invariant subspace. Theoretically, we analyze this behavior based on a simplified deep matrix factorization problem, showing that this oscillation behavior closely follows that of its nonlinear counterparts. We provably show that for $\eta$ within a specific range, the oscillations occur within a 2-period fixed orbit of the singular values, while the singular vectors remain invariant across all iterations. We extensively corroborate our theory with empirical justifications, namely in that (i) deep linear and nonlinear networks share many properties in their learning dynamics and (ii) our model captures the nuances that occur at the edge of stability which other models do not, providing deeper insights into this phenomenon.
Submission Number: 586
Loading