Nonlinear Behaviour of Critical Points for a Simple Neural Network

Published: 17 Oct 2024, Last Modified: 17 Oct 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In severely over-parametrized regimes, neural network optimization can be analyzed by linearization techniques as the neural tangent kernel, which shows gradient descent convergence to zero training error, and landscape analysis, which shows that all local minima are global minima. Practical networks are often much less over-parametrized, and training behavior becomes more nuanced and nonlinear. This paper contains a fine grained analysis of the nonlinearity for a simple shallow network in one dimension. We show that the networks have unfavorable critical points, which can be mitigated by sufficiently high local resolution. Given this resolution, all critical points satisfy $L_2$ loss bounds of optimal adaptive approximation in Sobolev and Besov spaces on convex and concave subdomains of the target function. These bounds cannot be matched by linear approximation methods and show nonlinear and global behavior of the critical point's inner weights.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Thank you for the decision. This version contains the following changes: - Correct references as requested in the decision. - Remove text colors from review.
Assigned Action Editor: ~Zhihui_Zhu1
Submission Number: 2888
Loading