Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks

Published: 18 Jun 2024, Last Modified: 18 Jun 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zhihui_Zhu1
Submission Number: 2219