ReLU Network with Width $d+\mathcal{O}(1)$ Can Achieve Optimal Approximation Rate

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The prevalent employment of narrow neural networks, characterized by their minimal parameter count per layer, has led to a surge in research exploring their potential as universal function approximators. A notable result in this field states that networks with just a width of $d+1$ can approximate any continuous function for input dimension $d$ arbitrarily well. However, the optimal approximation rate for these narrowest networks, i.e., the optimal relation between the count of tunable parameters and the approximation error, remained unclear. In this paper, we address this gap by proving that ReLU networks with width $d+1$ can achieve the optimal approximation rate for continuous functions over the domain $[0,1]^d$ under $L^p$ norm for $p\in[1,\infty)$. We further show that for the uniform norm, a width of $d+11$ is sufficient. We also extend the results to narrow feed-forward networks with various activations, confirming their capability to approximate at the optimal rate. This work adds to the understanding of universal approximation of narrow networks.
Submission Number: 1021
Loading