NTK with Convex Two-Layer ReLU Networks

ICLR 2026 Conference Submission17628 Authors

19 Sept 2025 (modified: 26 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: two-layer ReLU network theory, NTK, network width, separation margin, convex optimization
TL;DR: We analyze a convex formulation of two-layer ReLU neural networks that is nearly equivalent to the standard formulation and simplifies theoretical analyses.
Abstract: We theoretically analyze a convex variant of two-layer ReLU neural networks and how it relates to the standard formulation. We show that the formulations are equivalent with respect to their output values for a fixed dataset and also behave similarly during gradient-based optimization as long as the weights on the first layer of standard networks do not change too much, which is a common assumption for their convergence to an arbitrarily good solution. We further show that for any two-layer ReLU neural network, even considering those of infinite width, there exists a (weighted) network of width $O(n^{d-1})$ with the same output value on all data points. Furthermore, these finite networks have exactly the same eigenvalues $\lambda$ of their neural tangent kernel (NTK) matrix and the same NTK separation margin $\gamma$ as in the infinite width limit. After handling these preliminaries, we get to our main results: We give a $(1\pm\varepsilon)$ approximation algorithm for the separation margin $\gamma$ which was not known how to evaluate in general and we study two data examples: 1) a circular example for which we strengthen an $\Omega(\gamma^{-2})$ lower bound against previous worst-case width analyses; 2) a hypercube example that can be perfectly classified by the convex network formulation but not by any standard network, distinguishing their expressibility.
Primary Area: learning theory
Submission Number: 17628
Loading