Nonlinear Behaviour of Critical Points for a Simple Neural Network

TMLR Paper2888 Authors

18 Jun 2024 (modified: 24 Jun 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In severely over-parametrized regimes, neural network optimization can be analyzed by linearization techniques as the neural tangent kernel, which shows gradient descent convergence to zero training error, and landscape analysis, which shows that all local minima are global minima. Practical networks are often much less over-parametrized, and training behavior becomes more nuanced and nonlinear. This paper contains a fine grained analysis of the nonlinearity for a simple shallow network in one dimension. We show that the networks have unfavorable critical points, which can be mitigated by sufficiently high local resolution. Given this resolution, all critical points satisfy $L_2$ loss bounds of optimal adaptive approximation in Sobolev and Besov spaces on convex and concave subdomains of the target function. These bounds cannot be matched by linear approximation methods and show nonlinear and global behavior of the critical point's inner weights.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zhihui_Zhu1
Submission Number: 2888
Loading