Abstract: There is a growing interest on large-width asymptotic and non-asymptotic properties of deep Gaussian neural networks (NNs), namely NNs with weights initialized as Gaussian distributions. For a Gaussian NN of depth $L\geq1$ and width $n\geq1$, a well-established result is that, as $n\rightarrow+\infty$, the NN's output converges in distribution to a Gaussian process. Recently, quantitative versions of this CLT have been obtained by exploiting the recursive structure of the NN and its infinitely wide Gaussian limit, showing that the NN's output converges to its Gaussian limit at the rate $n^{-1/2}$, in the $2$-Wasserstein distance, as well as in some convex distances. In this paper, we investigate the use of second-order Gaussian Poincar\'e inequalities to obtain quantitive CLTs for the NN's output, showing their pros and cons in such a new field of application. For shallow Gaussian NNs, i.e. $L=1$, we show how second-order Poincar\'e inequalities provide a powerful tool, reducing the problem of establishing quantitative CLTs to the algebraic problem of computing the gradient and the Hessian of the NN's output, and lead to the rate of convergence $n^{-1/2}$ in the $1$-Wasserstein distance. Instead, for deep Gaussian NNs, i.e. $L\geq2$, the use of second-order Poincar\'e inequalities turns out to be more problematic. By relying on exact computations of the gradient and the Hessian of the NN's output, which is a non-trivial task due to its (algebraic) complexity that increases with $L$, we show that for $L=2$ second-order Poincar\'e inequalities still lead to a quantitative CLT in the $1$-Wasserstein distance, though with the rate of convergence $n^{-1/4}$, and we conjecture the same rate for any depth $L\geq2$. Such a worsening in the rate is a peculiar feature of the use of second-order Poincar\'e inequalities, which are designed to be applied directly to the NN's output as a function of all the previous layers, hence not exploiting the recursive structure of the NN and/or its infinitely wide Gaussian limit. While this is a negative result over the state-of-the-art, it does not diminish the effectiveness of second-order Poincar\'e inequalities, which we prove to maintain their effectiveness in establishing a quantitative CLT for a complicated functional of Gaussian processes such as the deep Gaussian NN.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=BKtxHvwnut
Changes Since Last Submission: We are re-submitting to TMLR the paper titled “Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities”, which was rejected from TMLR with the recommendation of revising and resubmit the paper. The link of the original submission is: https://openreview.net/forum?id=BKtxHvwnut
We revised the paper by considering all the suggestions of the reviewers, related to the organization of the material, missing references, new material to include, and related literature. We wish to thank the reviewers for encouraging us to prepare a revised version of the paper, which we believe it benefited a lot of their suggestions and comments. Two critical aspects in the process of revisions have been the discussion of the related literature, and the inclusion of new material. Below, we discuss these aspects.
Related literature. With regards to the related literature, while preparing the revision of the paper, 4 new papers on quantitative CLTs for deep NNs appeared have appeared on arXiv. In order of appearance:
[1] K. Balasubramanian, L. Goldstein, N. Ross, and A. Salim (28 June, 2023) Gaussian random field approximation via stein’s method with applications to wide random neural networks. arXiv preprint arXiv:2306.16308
[2] V. Cammarota, D. Marinucci, M. Salvi, and S. Vigogna (29 June, 2023) A quantitative functional central limit theorem for shallow neural networks. arXiv preprint arXiv:2306.16932
[3] N. Apollonio, D. De Canditiis, G. Franzina, P. Stolfi, and G.L. Torrisi (10 July, 2023) Normal approximation of random gaussian neural networks. Preprint arXiv:2307.04486
[4] S. Favaro, B. Hanin, D. Marinucci, I. Nourdin, and G. Peccati (12 July, 2023) Quantitative clts in deep neural networks. Preprint arXiv:2307.06092
[3] and [4] are very close to the paper of Basteri and Trevisan (2022) and to our paper, as they present results that are natural generalizations of the main results contained in Basteri and Trevisan (2022) and in our paper, with respect to general convex distances and weaker hypotheses on the activation function, providing quantitative CLTs with a rate of convergence $n^{-1/2}$. Instead [1] deals with a functional quantitative CLT for deep NNs with general non-Gaussian weights, whereas [2] deals with functional quantitative CLTs for shallow NNs on the sphere. The papers [1], [2], [3] and [4] include our paper in the references, recognizing it as one of first contribution in the study of quantitative CLTs for NNs. We included [1], [2], [3] and [4] in the references of our paper, and rewrote parts of the paper in order to explain the difference between the approaches developed in [3] and [4] and our approach based on second-order Poincaré inequalities. The approaches in [3] and [4] are close to the approach of Basteri and Trevisan (2022), in the sense that they provide quantitative CLT by relying on triangular inequalities that exploit the recursive structure of the NN and its infinitely wide Gaussian limit, whose terms are estimated by different techniques, depending on the distance. In our work, we do not follow this approach, since second-order Poincaré inequalities are designed to be applied directly to the output of the deep NN, as a function of all the previous NN layers.
New material. In the original paper, we showed how second-order Poincaré inequalities provide a powerful tool to establish quantitative CLT for shallow Gaussian NN: they reduce the problem to the algebraic problem of computing the gradient and the Hessian of the NN’s output, which is straightforward for shallow NNs, and they lead to the rate of convergence $n^{-1/2}$ in the $1$-Wasserstein distance. In the revised paper we considered the more general setting of deep Gaussian NNs, showing how the use of second-order Poincaré inequalities in such a setting is more problematic. By relying on exact computations of the gradient and the Hessian of the NN's output, which is a non-trivial task due to its (algebraic) complexity that increases with the depth, we show that for two layers second-order Poincaré inequalities still lead to a quantitative CLT in the $1$-Wasserstein distance, though with the rate of convergence $n^{-1/4}$, and we conjecture the same rate for any depth. Such a worsening in the rate is a peculiar feature of the use of second-order Poincaré inequalities, which are designed to be applied directly to the NN's output as a function of all the previous layers, hence not exploiting the recursive structure of the NN and/or its infinitely wide Gaussian limit. While this is a negative result over the state-of-the-art, it does not diminish the effectiveness of second-order Poincaré inequalities, which we prove to maintain their effectiveness in establishing a quantitative CLT for a complicated functional of Gaussian processes such as the deep Gaussian NN.
Assigned Action Editor: ~Atsushi_Nitanda1
Submission Number: 1395
Loading