Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations

TMLR Paper3707 Authors

18 Nov 2024 (modified: 21 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. It is shown here that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in (Euclidean) norm and approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of the recently introduced neural correlation function. Additionally, this paper also studies the KKT points of the neural correlation function for feed-forward networks with (Leaky) ReLU and polynomial (Leaky) ReLU activations, deriving necessary and sufficient conditions for rank-one KKT points.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Quanquan_Gu1
Submission Number: 3707
Loading