Optimal Neural Network Approximation for High-Dimensional Continuous Functions

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Neural Network Optimizations, KA representation, Elementary Superexpressive Activations
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: The original version of the Kolmogorov-Arnold representation theorem states that for any continuous function $f : [0, 1]^d \rightarrow \mathbb{R}$, there exist $(d+1)(2d+1)$ univariate continuous functions such that $f$ can be expressed as product and linear combinations of them. So one can use this representation to find the optimum size of a neural network to approximate $f$. Now the important question is to check how does the size of the neural network depends on $d$. It is proved that function space generated by special class of activation function called EUAF (elementary universal activation function), with $\mathcal{O}(d^2)$ neurons is dense in $C([a, b]^d)$ with 11 hidden layers. In this paper we provide classes of $d$-variate functions for which the optimized neural networks will have $\mathcal{O}(d)$ number of neurons with elementary superexpressive activation function defined by Yarotsky. We provide a new construction of neural network of $\mathcal{O}(d)$ neuron size to approximate $d$-variate continuous functions of certain classes. We also prove that the size $\mathcal{O}(d)$ is optimal in those cases.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6948
Loading