NTK with Convex Two-Layer ReLU Networks

Alexander Munteanu; Simon Omlor

NTK with Convex Two-Layer ReLU Networks

Alexander Munteanu, Simon Omlor

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: two-layer ReLU network theory, NTK, network width, separation margin, convex optimization

TL;DR: We analyze a convex formulation of two-layer ReLU neural networks that is nearly equivalent to the standard formulation and simplifies theoretical analyses.

Abstract: We theoretically analyze a convex variant of two-layer ReLU neural networks and how it relates to the standard formulation. We show that the formulations are equivalent with respect to their output values for a fixed dataset and also behave similarly during gradient-based optimization as long as the weights on the first layer of standard networks do not change too much, which is a common assumption for their convergence to an arbitrarily good solution. We further show that for any two-layer ReLU neural network, even considering those of infinite width, there exists a (weighted) network of width $O(n^{d-1})$ with the same output value on all data points. Furthermore, these finite networks have exactly the same eigenvalues $\lambda$ of their neural tangent kernel (NTK) matrix and the same NTK separation margin $\gamma$ as in the infinite width limit. After handling these preliminaries, we get to our main results: We give a $(1\pm\varepsilon)$ approximation algorithm for the separation margin $\gamma$ which was not known how to evaluate in general and we study two data examples: 1) a circular example for which we strengthen an $\Omega(\gamma^{-2})$ lower bound against previous worst-case width analyses; 2) a hypercube example that can be perfectly classified by the convex network formulation but not by any standard network, distinguishing their expressibility.

Primary Area: learning theory

Submission Number: 17628

Loading