Optimal Minimum Width for the Universal Approximation of Continuously Differentiable Functions by Deep Narrow MLPs

Geonho Hwang

Optimal Minimum Width for the Universal Approximation of Continuously Differentiable Functions by Deep Narrow MLPs

Geonho Hwang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-ND 4.0

Keywords: Universal Approimation, Sobolev norm, Continuously differentiable functions, deep narrow MLPs, deep learning

TL;DR: In this paper, we investigate the universal approximation property of deep, narrow multilayer perceptrons (MLPs) for $C^1$ functions under the Sobolev norm, specifically the $W^{1, \infty}$ norm.

Abstract: In this paper, we investigate the universal approximation property of deep, narrow multilayer perceptrons (MLPs) for $C^1$ functions under the Sobolev norm, specifically the $W^{1, \infty}$ norm. Although the optimal width of deep, narrow MLPs for approximating continuous functions has been extensively studied, significantly less is known about the corresponding optimal width for $C^1$ functions. We demonstrate that \textit{the optimal width} can be determined in a wide range of cases within the $C^1$ setting. Our approach consists of two main steps. First, leveraging control theory, we show that any diffeomorphism can be approximated by deep, narrow MLPs. Second, using the Borsuk-Ulam theorem and various results from differential geometry, we prove that the optimal width for approximating arbitrary $C^1$ functions via diffeomorphisms is $\min(n + m, \max(2n + 1, m))$ in certain cases, including $(n,m) = (8,8)$ and $(16,8)$, where $n$ and $m$ denote the input and output dimensions, respectively. Our results apply to a broad class of activation functions.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 6750

Loading