Learnability of high-dimensional targets by two-parameter models and gradient flow

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: gradient descent, gradient flow, learning, optimization, local minima, Cantor set, elementary function, Pfaffian function, Borsuk-Ulam antipodality theorem
Abstract: We explore the theoretical possibility of learning $d$-dimensional targets with $W$-parameter models by gradient flow (GF) when $W<d$. Our main result shows that if the targets are described by a particular $d$-dimensional probability distribution, then there exist models with as few as two parameters that can learn the targets with arbitrarily high success probability. On the other hand, we show that for $W<d$ there is necessarily a large subset of GF-non-learnable targets. In particular, the set of learnable targets is not dense in $\mathbb R^d$, and any subset of $\mathbb R^d$ homeomorphic to the $W$-dimensional sphere contains non-learnable targets. Finally, we observe that the model in our main theorem on almost guaranteed two-parameter learning is constructed using a hierarchical procedure and as a result is not expressible by a single elementary function. We show that this limitation is essential in the sense that most models written in terms of elementary functions cannot achieve the learnability demonstrated in this theorem.
Primary Area: Optimization (convex and non-convex, discrete, stochastic, robust)
Submission Number: 1861
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview