Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds

16 Sept 2025 (modified: 25 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: CS Theory, Non convex optimization, ADAM, Deep Neural Networks, Convergence, Geometric Measure Spaces, topology
TL;DR: Convergence for Adam using Deep ReLU Networks
Abstract: First-order adaptive optimization methods like Adam are the default choices for training modern deep neural networks. Despite their empirical success, the theoretical understanding of these methods in non-smooth settings, particularly in Deep ReLU networks, remains limited. ReLU activations create exponentially many region boundaries where standard smoothness assumptions break down. \textbf{We derive the first $\tilde{O}\!\bigl(\sqrt{d_{\mathrm{eff}}/n}\bigr)$ generalization bound for Adam in Deep ReLU networks and the first global-optimal convergence for Adam in the non smooth, non convex relu landscape without a global PL or convexity assumption.} Our analysis is based on stratified Morse theory and novel results in Kakeya sets. We develop a multi-layer refinement framework that progressively tightens bounds on region crossings. We prove that the number of region crossings collapses from exponential to near-linear in the effective dimension. Using a Kakeya based method, we give a tighter generalization bound than PAC-Bayes approaches and showcase convergence using a mild uniform low barrier assumption.
Primary Area: optimization
Submission Number: 8128
Loading