Exploring the Edge of Stability: Insights from a Fine-Grained Analysis of Gradient Descent in Shallow ReLU Networks

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: gradient descent, edge of stability
TL;DR: We analyze the edge of stability in gradient descent dynamics, specifically on shallow ReLU networks with squared loss on orthogonal inputs.
Abstract: Gradient descent (GD) in modern neural networks initially sharpens the loss landscape by increasing the top Hessian eigenvalues until the step size becomes unstable. Subsequently, it enters the ``Edge of Stability'' (EoS) regime, characterized by unstable step size and non-monotonic loss reduction. EoS regime challenges conventional step size wisdom, sparking recent intensive research. However, a detailed characterization of EoS within the fine-grained GD neural network training dynamics remains under-explored. This paper provides a comprehensive analysis of both the sharpening phase and the EoS regime throughout the entire GD dynamics, focusing on shallow ReLU networks with squared loss on orthogonal inputs. Our theory characterizes the evolution of the top Hessian eigenvalues and elucidates the mechanisms behind EoS training. Leveraging this analysis, we present empirical validations of our predictions regarding sharpening and EoS dynamics, contributing to a deeper understanding of neural network training processes.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8053
Loading