A partial theory of Wide Neural Networks using WC functions and its practical implications

Dario Balboni; Davide Bacciu

A partial theory of Wide Neural Networks using WC functions and its practical implications

Dario Balboni, Davide Bacciu

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Abstract: We present a framework based on the theory of Polyak-Łojasiewicz functions to explain the properties of convergence and generalization of overparameterized feed-forward neural networks. We introduce the class of Well-Conditioned (WC) reparameterizations, which are closed under composition and preserve the class of Polyak-Łojasiewicz functions, thus enabling compositionality of the framework results which can be studied separately for each layer and in an architecture-neutral way. We show that overparameterized neural layers are WC and can therefore be composed to build easily optimizable functions. We expose a pointwise stability bound implying that overparameterization in WC models leads to a tighter convergence around a global minimizer. Our framework allows to derive quantitative estimates for the terms that govern the optimization process of neural networks. We leverage this aspect to empirically evaluate the predictions set forth by some relevant published theories concerning conditioning, training speed, and generalization of the neural networks training process. Our contribution aims to encourage the development of mixed theoretical-practical approaches, where the properties postulated by the theory can also find empirical confirmation.

Supplementary Material: zip

16 Replies

Loading