Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks

TMLR Paper2219 Authors

16 Feb 2024 (modified: 24 Apr 2024)Under review for TMLREveryoneRevisionsBibTeX
Abstract: This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=GOhdITQSO9
Changes Since Last Submission: In the previous version, we unintentionally changed the fonts of the text and page number. In this submission, we have mainly corrected those errors. We also made some other smaller changes. In part(2) of Lemma B.1, we changed $F(\mathbf{s})$ to $F(\mathbf{w})$. We added $\ell$ at two places in the paragraph after equation 21.
Assigned Action Editor: ~Zhihui_Zhu1
Submission Number: 2219
Loading