Concentration inequalities and optimal number of layers for stochastic deep neural networks

TMLR Paper203 Authors

22 Jun 2022 (modified: 28 Feb 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: We state concentration and martingale inequalities for the output of the hidden layers of a stochastic deep neural network (SDNN), as well as for the output of the whole SDNN. These results allow us to introduce an expected classifier (EC), and to give probabilistic upper bound for the classification error of the EC. We also state the optimal number of layers for the SDNN via an optimal stopping procedure. We apply our analysis to a stochastic version of a feedforward neural network with ReLU activation function.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=7Ucy1IQo8V
Changes Since Last Submission: • we elaborated on the difference between our work, Ost and Reynaud-Bouret (2020), and Garnier and Langhendries (2022). We explained why our framework requires less mathematical structure, which papers inspired some of the techniques we used for our results, and why some TMLR audience should be interested in our work in section 1.2 • we elaborated on the significance of finding probabilistic results in the framework of Zhang et al., and what future directions do these results enable in section 1.2 • we described stochastic NN in the introduction – also giving the example of Bayesian Neural Networks • we now write sub-Gaussian instead of subGaussian • we explicitly stated that propositions 1 and 2 are informal, and where the formal version is. In the main portion of the paper, then, we referenced propositions 1 and 2 when the formal version is introduced • we substituted Y (used for the sequence of mean-centered outputs) in corollary 6 with Z • we especially appreciated the concern about remark 10. It turns out that in the whole section 2.2 the bounds increase with the number of layers. This means that the results we give in that section better suit shallow networks, as the inequalities become looser as the number of layers increases • we especially appreciated the concern about the loss function in section 3. We completely rewrote the initial part of section 3, which now addresses all the raised concerns by reviewer GwBG. As a result, the section is greatly improved • we wrote the proof to proposition 18 explicitly • we corrected typos noted • we summarized Hayes 2005, Theorem 1.8 in the appendix • we condensed section 4 and moved most of the details to the appendix. We also gave more explicit explanation for how the previous theorems and the results in Zhang et al. lead into propositions 17 and 18 in section 4.3 • we addressed the concerns of reviewer 6t6b
Assigned Action Editor: ~Hanie_Sedghi1
Submission Number: 203
Loading